git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-06-11	KVM: selftests: Multiplex return code and fd in __kvm_create_device()	Sean Christopherson
	Multiplex the return value and fd (on success) in __kvm_create_device() to mimic common library helpers that return file descriptors, e.g. open(). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Move KVM_CREATE_DEVICE_TEST code to separate helper	Sean Christopherson
	Move KVM_CREATE_DEVICE_TEST to its own helper, identifying "real" versus "test" device creation based on a hardcoded boolean buried in the middle of a param list is painful for readers. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Drop @test param from kvm_create_device()	Sean Christopherson
	Remove the two calls that pass @test=true to kvm_create_device() and drop the @test param entirely. The two removed calls don't check the return value of kvm_create_device(), so other than verifying KVM doesn't explode, which is extremely unlikely given that the non-test variant was _just_ called, they are pointless and provide no validation coverage. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Use KVM_IOCTL_ERROR() for one-off arm64 ioctls	Sean Christopherson
	Use the KVM_IOCTL_ERROR() macro to generate error messages for a handful of one-off arm64 ioctls. The calls in question are made without an associated struct kvm_vm/kvm_vcpu as they are used to configure those structs, i.e. can't be easily converted to e.g. vcpu_ioctl(). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Get rid of kvm_util_internal.h	Sean Christopherson
	Fold kvm_util_internal.h into kvm_util_base.h, i.e. make all KVM utility stuff "public". Hiding struct implementations from tests has been a massive failure, as it has led to pointless and poorly named wrappers, unnecessarily opaque code, etc... Not to mention that the approach was a complete failure as evidenced by the non-zero number of tests that were including kvm_util_internal.h. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Make x86-64's register dump helpers static	Sean Christopherson
	Make regs_dump() and sregs_dump() static, they're only implemented by x86 and only used internally. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Use __KVM_SYSCALL_ERROR() to handle non-KVM syscall errors	Sean Christopherson
	Use __KVM_SYSCALL_ERROR() to report and pretty print non-KVM syscall and ioctl errors, e.g. for mmap(), munmap(), uffd ioctls, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Use kvm_ioctl() helpers	Sean Christopherson
	Use the recently introduced KVM-specific ioctl() helpers instead of open coding calls to ioctl() just to pretty print the ioctl name. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Make kvm_ioctl() a wrapper to pretty print ioctl name	Sean Christopherson
	Make kvm_ioctl() a macro wrapper and print the _name_ of the ioctl on failure instead of the number. Deliberately do not use __stringify(), as that will expand the ioctl all the way down to its numerical sequence, again the intent is to print the name of the macro. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: sefltests: Use vm_ioctl() and __vm_ioctl() helpers	Sean Christopherson
	Use the recently introduced VM-specific ioctl() helpers instead of open coding calls to ioctl() just to pretty print the ioctl name. Keep a few open coded assertions that provide additional info. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Make vm_ioctl() a wrapper to pretty print ioctl name	Sean Christopherson
	Make vm_ioctl() a macro wrapper and print the _name_ of the ioctl on failure instead of the number. Deliberately do not use __stringify(), as that will expand the ioctl all the way down to its numerical sequence. Again the intent is to print the name of the macro. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Add vcpu_get() to retrieve and assert on vCPU existence	Sean Christopherson
	Add vcpu_get() to wrap vcpu_find() and deduplicate a pile of code that asserts the requested vCPU exists. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Remove vcpu_get_fd()	Sean Christopherson
	Drop vcpu_get_fd(), it no longer has any users, and really should not exist as the framework has failed if tests need to manually operate on a vCPU fd. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Use vcpu_access_device_attr() in arm64 code	Sean Christopherson
	Use vcpu_access_device_attr() in arm's arch_timer test instead of manually retrieving the vCPU's fd. This will allow dropping vcpu_get_fd() in a future patch. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Add __vcpu_run() helper	Sean Christopherson
	Add __vcpu_run() so that tests that want to avoid asserts on KVM_RUN failures don't need to open code the ioctl() call. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: sefltests: Use vcpu_ioctl() and __vcpu_ioctl() helpers	Sean Christopherson
	Use the recently introduced vCPU-specific ioctl() helpers instead of open coding calls to ioctl() just to pretty print the ioctl name. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Split vcpu_set_nested_state() into two helpers	Sean Christopherson
	Split vcpu_nested_state_set() into a wrapper that asserts, and an inner helper that does not. Passing a bool is all kinds of awful as it's unintuitive for readers and requires returning an 'int' from a function that for most users can never return anything other than "success". Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Drop @mode from common vm_create() helper	Sean Christopherson
	Drop @mode from vm_create() and have it use VM_MODE_DEFAULT. Add and use an inner helper, __vm_create(), to service the handful of tests that want something other than VM_MODE_DEFAULT. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Make vcpu_ioctl() a wrapper to pretty print ioctl name	Sean Christopherson
	Make vcpu_ioctl() a macro wrapper and pretty the _name_ of the ioctl on failure instead of the number. Add inner macros to allow handling cases where the name of the ioctl needs to be resolved higher up the stack, and to allow using the formatting for non-ioctl syscalls without being technically wrong. Deliberately do not use __stringify(), as that will expand the ioctl all the way down to its numerical sequence, again the intent is to print the name of the macro. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Add another underscore to inner ioctl() helpers	Sean Christopherson
	Add a second underscore to inner ioctl() helpers to better align with commonly accepted kernel coding style, and to allow using a single underscore variant in the future for macro shenanigans. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Always open VM file descriptors with O_RDWR	Sean Christopherson
	Drop the @perm param from vm_create() and always open VM file descriptors with O_RDWR. There's no legitimate use case for other permissions, and if a selftest wants to do oddball negative testing it can open code the necessary bits instead of forcing a bunch of tests to provide useless information. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Drop stale declarations from kvm_util_base.h	Sean Christopherson
	Drop declarations for allocate_kvm_dirty_log() and vm_create_device(), which no longer have implementations. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Fix typo in vgic_init test	Sean Christopherson
	When iterating over vCPUs, invoke access_v3_redist_reg() on the "current" vCPU instead of vCPU0, which is presumably what was intended by iterating over all vCPUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: selftests: Fix buggy-but-benign check in test_v3_new_redist_regions()	Sean Christopherson
	Update 'ret' with the return value of _kvm_device_access() prior to asserting that ret is non-zero. In the current code base, the flaw is benign as 'ret' is guaranteed to be -EBUSY from the previous run_vcpu(), which also means that errno==EBUSY prior to _kvm_device_access(), thus the "errno == EFAULT" part of the assert means that a false negative is impossible (unless the kernel is being truly mean and spuriously setting errno=EFAULT while returning success). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11	KVM: Fix references to non-existent KVM_CAP_TRIPLE_FAULT_EVENT	Sean Christopherson
	The x86-only KVM_CAP_TRIPLE_FAULT_EVENT was (appropriately) renamed to KVM_CAP_X86_TRIPLE_FAULT_EVENT when the patches were applied, but the docs and selftests got left behind. Fix them. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Bug the VM on an out-of-bounds data read	Sean Christopherson
	Bug the VM and terminate emulation if an out-of-bounds read into the emulator's data cache occurs. Knowingly contuining on all but guarantees that KVM will overwrite random kernel data, which is far, far worse than killing the VM. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220526210817.3428868-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Bug the VM if the emulator generates a bogus exception vector	Sean Christopherson
	Bug the VM if KVM's emulator attempts to inject a bogus exception vector. The guest is likely doomed even if KVM continues on, and propagating a bad vector to the rest of KVM runs the risk of breaking other assumptions in KVM and thus triggering a more egregious bug. All existing users of emulate_exception() have hardcoded vector numbers (__load_segment_descriptor() uses a few different vectors, but they're all hardcoded), and future users are likely to follow suit, i.e. the change to emulate_exception() is a glorified nop. As for the ctxt->exception.vector check in x86_emulate_insn(), the few known times the WARN has been triggered in the past is when the field was not set when synthesizing a fault, i.e. for all intents and purposes the check protects against consumption of uninitialized data. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220526210817.3428868-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Bug the VM if the emulator accesses a non-existent GPR	Sean Christopherson
	Bug the VM, i.e. kill it, if the emulator accesses a non-existent GPR, i.e. generates an out-of-bounds GPR index. Continuing on all but gaurantees some form of data corruption in the guest, e.g. even if KVM were to redirect to a dummy register, KVM would be incorrectly read zeros and drop writes. Note, bugging the VM doesn't completely prevent data corruption, e.g. the current round of emulation will complete before the vCPU bails out to userspace. But, the very act of killing the guest can also cause data corruption, e.g. due to lack of file writeback before termination, so taking on additional complexity to cleanly bail out of the emulator isn't justified, the goal is purely to stem the bleeding and alert userspace that something has gone horribly wrong, i.e. to avoid _silent_ data corruption. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220526210817.3428868-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM	Sean Christopherson
	Reduce the number of GPRs emulated by 32-bit KVM from 16 to 8. KVM does not support emulating 64-bit mode on 32-bit host kernels, and so should never generate accesses to R8-15. Opportunistically use NR_EMULATOR_GPRS in rsm_load_state_{32,64}() now that it is precise and accurate for both flavors. Wrap the definition with full #ifdef ugliness; sadly, IS_ENABLED() doesn't guarantee a compile-time constant as far as BUILD_BUG_ON() is concerned. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <20220526210817.3428868-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Use 16-bit fields to track dirty/valid emulator GPRs	Sean Christopherson
	Use a u16 instead of a u32 to track the dirty/valid status of GPRs in the emulator. Unlike struct kvm_vcpu_arch, x86_emulate_ctxt tracks only the "true" GPRs, i.e. doesn't include RIP in its array, and so only needs to track 16 registers. Note, maxing out at 16 GPRs is a fundamental property of x86-64 and will not change barring a massive architecture update. Legacy x86 ModRM and SIB encodings use 3 bits for GPRs, i.e. support 8 registers. x86-64 uses a single bit in the REX prefix for each possible reference type to double the number of supported GPRs to 16 registers (4 bits). Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220526210817.3428868-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Omit VCPU_REGS_RIP from emulator's _regs array	Sean Christopherson
	Omit RIP from the emulator's _regs array, which is used only for GPRs, i.e. registers that can be referenced via ModRM and/or SIB bytes. The emulator uses the dedicated _eip field for RIP, and manually reads from _eip to handle RIP-relative addressing. To avoid an even bigger, slightly more dangerous change, hardcode the number of GPRs to 16 for the time being even though 32-bit KVM's emulator technically should only have 8 GPRs. Add a TODO to address that in a future commit. See also the comments above the read_gpr() and write_gpr() declarations, and obviously the handling in writeback_registers(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <20220526210817.3428868-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Harden _regs accesses to guard against buggy input	Sean Christopherson
	WARN and truncate the incoming GPR number/index when reading/writing GPRs in the emulator to guard against KVM bugs, e.g. to avoid out-of-bounds accesses to ctxt->_regs[] if KVM generates a bogus index. Truncate the index instead of returning e.g. zero, as reg_write() returns a pointer to the register, i.e. returning zero would result in a NULL pointer dereference. KVM could also force the index to any arbitrary GPR, but that's no better or worse, just different. Open code the restriction to 16 registers; RIP is handled via _eip and should never be accessed through reg_read() or reg_write(). See the comments above the declarations of reg_read() and reg_write(), and the behavior of writeback_registers(). The horrific open coded mess will be cleaned up in a future commit. There are no such bugs known to exist in the emulator, but determining that KVM is bug-free is not at all simple and requires a deep dive into the emulator. The code is so convoluted that GCC-12 with the recently enable -Warray-bounds spits out a false-positive due to a GCC bug: arch/x86/kvm/emulate.c:254:27: warning: array subscript 32 is above array bounds of 'long unsigned int[17]' [-Warray-bounds] 254 \| return ctxt->_regs[nr]; \| ~~~~~~~~~~~^~~~ In file included from arch/x86/kvm/emulate.c:23: arch/x86/kvm/kvm_emulate.h: In function 'reg_rmw': arch/x86/kvm/kvm_emulate.h:366:23: note: while referencing '_regs' 366 \| unsigned long _regs[NR_VCPU_REGS]; \| ^~~~~ Link: https://lore.kernel.org/all/YofQlBrlx18J7h9Y@google.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216026 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105679 Reported-and-tested-by: Robert Dinse <nanook@eskimo.com> Reported-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220526210817.3428868-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10	KVM: x86: Grab regs_dirty in local 'unsigned long'	Sean Christopherson
	Capture ctxt->regs_dirty in a local 'unsigned long' instead of casting it to an 'unsigned long *' for use in for_each_set_bit(). The bitops helpers really do read the entire 'unsigned long', even though the walking of the read value is capped at the specified size. I.e. 64-bit KVM is reading memory beyond ctxt->regs_dirty, which is a u32 and thus 4 bytes, whereas an unsigned long is 8 bytes. Functionally it's not an issue because regs_dirty is in the middle of x86_emulate_ctxt, i.e. KVM is just reading its own memory, but relying on that coincidence is gross and unsafe. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220526210817.3428868-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	Merge branch 'kvm-5.20-early'	Paolo Bonzini
	s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to show tests x86: * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Rewrite gfn-pfn cache refresh * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit
2022-06-09	KVM: selftests: Restrict test region to 48-bit physical addresses when using ↵	David Matlack
	nested The selftests nested code only supports 4-level paging at the moment. This means it cannot map nested guest physical addresses with more than 48 bits. Allow perf_test_util nested mode to work on hosts with more than 48 physical addresses by restricting the guest test region to 48-bits. While here, opportunistically fix an off-by-one error when dealing with vm_get_max_gfn(). perf_test_util.c was treating this as the maximum number of GFNs, rather than the maximum allowed GFN. This didn't result in any correctness issues, but it did end up shifting the test region down slightly when using huge pages. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2	David Matlack
	Add an option to dirty_log_perf_test that configures the vCPUs to run in L2 instead of L1. This makes it possible to benchmark the dirty logging performance of nested virtualization, which is particularly interesting because KVM must shadow L1's EPT/NPT tables. For now this support only works on x86_64 CPUs with VMX. Otherwise passing -n results in the test being skipped. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Clean up LIBKVM files in Makefile	David Matlack
	Break up the long lines for LIBKVM and alphabetize each architecture. This makes reading the Makefile easier, and will make reading diffs to LIBKVM easier. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Link selftests directly with lib object files	David Matlack
	The linker does obey strong/weak symbols when linking static libraries, it simply resolves an undefined symbol to the first-encountered symbol. This means that defining __weak arch-generic functions and then defining arch-specific strong functions to override them in libkvm will not always work. More specifically, if we have: lib/generic.c: void __weak foo(void) { pr_info("weak\n"); } void bar(void) { foo(); } lib/x86_64/arch.c: void foo(void) { pr_info("strong\n"); } And a selftest that calls bar(), it will print "weak". Now if you make generic.o explicitly depend on arch.o (e.g. add function to arch.c that is called directly from generic.c) it will print "strong". In other words, it seems that the linker is free to throw out arch.o when linking because generic.o does not explicitly depend on it, which causes the linker to lose the strong symbol. One solution is to link libkvm.a with --whole-archive so that the linker doesn't throw away object files it thinks are unnecessary. However that is a bit difficult to plumb since we are using the common selftests makefile rules. An easier solution is to drop libkvm.a just link selftests with all the .o files that were originally in libkvm.a. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Drop unnecessary rule for STATIC_LIBS	David Matlack
	Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend on $(STATIC_LIBS), so there is no reason to have an extra "all" rule. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Add a helper to check EPT/VPID capabilities	David Matlack
	Create a small helper function to check if a given EPT/VPID capability is supported. This will be re-used in a follow-up commit to check for 1G page support. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h	David Matlack
	This is a VMX-related macro so move it to vmx.h. While here, open code the mask like the rest of the VMX bitmask macros. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Refactor nested_map() to specify target level	David Matlack
	Refactor nested_map() to specify that it explicityl wants 4K mappings (the existing behavior) and push the implementation down into __nested_map(), which can be used in subsequent commits to create huge page mappings. No function change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Drop stale function parameter comment for nested_map()	David Matlack
	nested_map() does not take a parameter named eptp_memslot. Drop the comment referring to it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Add option to create 2M and 1G EPT mappings	David Matlack
	The current EPT mapping code in the selftests only supports mapping 4K pages. This commit extends that support with an option to map at 2M or 1G. This will be used in a future commit to create large page mappings to test eager page splitting. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Replace x86_page_size with PG_LEVEL_XX	David Matlack
	x86_page_size is an enum used to communicate the desired page size with which to map a range of memory. Under the hood they just encode the desired level at which to map the page. This ends up being clunky in a few ways: - The name suggests it encodes the size of the page rather than the level. - In other places in x86_64/processor.c we just use a raw int to encode the level. Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass around raw ints when referring to the level. This makes the code easier to understand since these macros are very common in KVM MMU code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE	Paolo Bonzini
	Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put	Maxim Levitsky
	Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un\|}blocking	Maxim Levitsky
	On SVM, if preemption happens right after the call to finish_rcuwait but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself will re-enable AVIC, and then we will try to re-enable it again in kvm_arch_vcpu_unblocking which will lead to a warning in __avic_vcpu_load. The same problem can happen if the vCPU is preempted right after the call to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait and in this case, we will end up with AVIC enabled during sleep - Ooops. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: x86: disable preemption while updating apicv inhibition	Maxim Levitsky
	Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: x86: SVM: fix avic_kick_target_vcpus_fast	Maxim Levitsky
	There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also donâ€™t use â€˜indexâ€™ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>