linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-11-16	KVM: selftests: Remove useless shifts when creating guest page tables	Sean Christopherson
	Remove the pointless shift from GPA=>GFN and immediately back to GFN=>GPA when creating guest page tables. Ignore the other walkers that have a similar pattern for the moment, they will be converted to use virt_get_pte() in the near future. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006004512.666529-4-seanjc@google.com
2022-11-16	KVM: selftests: Drop reserved bit checks from PTE accessor	Sean Christopherson
	Drop the reserved bit checks from the helper to retrieve a PTE, there's very little value in sanity checking the constructed page tables as any will quickly be noticed in the form of an unexpected #PF. The checks also place unnecessary restrictions on the usage of the helpers, e.g. if a test _wanted_ to set reserved bits for whatever reason. Removing the NX check in particular allows for the removal of the @vcpu param, which will in turn allow the helper to be reused nearly verbatim for addr_gva2gpa(). Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006004512.666529-3-seanjc@google.com
2022-11-16	KVM: selftests: Drop helpers to read/write page table entries	Sean Christopherson
	Drop vm_{g,s}et_page_table_entry() and instead expose the "inner" helper (was _vm_get_page_table_entry()) that returns a _pointer_ to the PTE, i.e. let tests directly modify PTEs instead of bouncing through helpers that just make life difficult. Opportunsitically use BIT_ULL() in emulator_error_test, and use the MAXPHYADDR define to set the "rogue" GPA bit instead of open coding the same value. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006004512.666529-2-seanjc@google.com
2022-11-16	KVM: selftests: Fix spelling mistake "begining" -> "beginning"	Colin Ian King
	There is a spelling mistake in an assert message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220928213458.64089-1-colin.i.king@gmail.com [sean: fix an ironic typo in the changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Add ucall pool based implementation	Peter Gonda
	To play nice with guests whose stack memory is encrypted, e.g. AMD SEV, introduce a new "ucall pool" implementation that passes the ucall struct via dedicated memory (which can be mapped shared, a.k.a. as plain text). Because not all architectures have access to the vCPU index in the guest, use a bitmap with atomic accesses to track which entries in the pool are free/used. A list+lock could also work in theory, but synchronizing the individual pointers to the guest would be a mess. Note, there's no need to rewalk the bitmap to ensure success. If all vCPUs are simply allocating, success is guaranteed because there are enough entries for all vCPUs. If one or more vCPUs are freeing and then reallocating, success is guaranteed because vCPUs _always_ walk the bitmap from 0=>N; if vCPU frees an entry and then wins a race to re-allocate, then either it will consume the entry it just freed (bit is the first free bit), or the losing vCPU is guaranteed to see the freed bit (winner consumes an earlier bit, which the loser hasn't yet visited). Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Peter Gonda <pgonda@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-8-seanjc@google.com
2022-11-16	KVM: selftests: Drop now-unnecessary ucall_uninit()	Sean Christopherson
	Drop ucall_uninit() and ucall_arch_uninit() now that ARM doesn't modify the host's copy of ucall_exit_mmio_addr, i.e. now that there's no need to reset the pointer before potentially creating a new VM. The few calls to ucall_uninit() are all immediately followed by kvm_vm_free(), and that is likely always going to hold true, i.e. it's extremely unlikely a test will want to effectively disable ucall in the middle of a test. Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Tested-by: Peter Gonda <pgonda@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-7-seanjc@google.com
2022-11-16	KVM: selftests: Make arm64's MMIO ucall multi-VM friendly	Sean Christopherson
	Fix a mostly-theoretical bug where ARM's ucall MMIO setup could result in different VMs stomping on each other by cloberring the global pointer. Fix the most obvious issue by saving the MMIO gpa into the VM. A more subtle bug is that creating VMs in parallel (on multiple tasks) could result in a VM using the wrong address. Synchronizing a global to a guest effectively snapshots the value on a per-VM basis, i.e. the "global" is already prepped to work with multiple VMs, but setting the global in the host is not thread-safe. To fix that bug, add write_guest_global() to allow stuffing a VM's copy of a "global" without modifying the host value. Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Tested-by: Peter Gonda <pgonda@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-6-seanjc@google.com
2022-11-16	tools: Add atomic_test_and_set_bit()	Peter Gonda
	Add x86 and generic implementations of atomic_test_and_set_bit() to allow KVM selftests to atomically manage bitmaps. Note, the generic version is taken from arch_test_and_set_bit() as of commit 415d83249709 ("locking/atomic: Make test_and_*_bit() ordered on failure"). Signed-off-by: Peter Gonda <pgonda@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-5-seanjc@google.com
2022-11-16	KVM: selftests: Automatically do init_ucall() for non-barebones VMs	Sean Christopherson
	Do init_ucall() automatically during VM creation to kill two (three?) birds with one stone. First, initializing ucall immediately after VM creations allows forcing aarch64's MMIO ucall address to immediately follow memslot0. This is still somewhat fragile as tests could clobber the MMIO address with a new memslot, but it's safe-ish since tests have to be conversative when accounting for memslot0. And this can be hardened in the future by creating a read-only memslot for the MMIO page (KVM ARM exits with MMIO if the guest writes to a read-only memslot). Add a TODO to document that selftests can and should use a memslot for the ucall MMIO (doing so requires yet more rework because tests assumes thay can use all memslots except memslot0). Second, initializing ucall for all VMs prepares for making ucall initialization meaningful on all architectures. aarch64 is currently the only arch that needs to do any setup, but that will change in the future by switching to a pool-based implementation (instead of the current stack-based approach). Lastly, defining the ucall MMIO address from common code will simplify switching all architectures (except s390) to a common MMIO-based ucall implementation (if there's ever sufficient motivation to do so). Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Tested-by: Peter Gonda <pgonda@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-4-seanjc@google.com
2022-11-16	KVM: selftests: Consolidate boilerplate code in get_ucall()	Sean Christopherson
	Consolidate the actual copying of a ucall struct from guest=>host into the common get_ucall(). Return a host virtual address instead of a guest virtual address even though the addr_gva2hva() part could be moved to get_ucall() too. Conceptually, get_ucall() is invoked from the host and should return a host virtual address (and returning NULL for "nothing to see here" is far superior to returning 0). Use pointer shenanigans instead of an unnecessary bounce buffer when the caller of get_ucall() provides a valid pointer. Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Tested-by: Peter Gonda <pgonda@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-3-seanjc@google.com
2022-11-16	KVM: selftests: Consolidate common code for populating ucall struct	Sean Christopherson
	Make ucall() a common helper that populates struct ucall, and only calls into arch code to make the actually call out to userspace. Rename all arch-specific helpers to make it clear they're arch-specific, and to avoid collisions with common helpers (one more on its way...) Add WRITE_ONCE() to stores in ucall() code (as already done to aarch64 code in commit 9e2f6498efbb ("selftests: KVM: Handle compiler optimizations in ucall")) to prevent clang optimizations breaking ucalls. Cc: Colton Lewis <coltonlewis@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Tested-by: Peter Gonda <pgonda@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-2-seanjc@google.com
2022-11-16	KVM: arm64: selftests: Disable single-step without relying on ucall()	Sean Christopherson
	Automatically disable single-step when the guest reaches the end of the verified section instead of using an explicit ucall() to ask userspace to disable single-step. An upcoming change to implement a pool-based scheme for ucall() will add an atomic operation (bit test and set) in the guest ucall code, and if the compiler generate "old school" atomics, e.g. 40e57c: c85f7c20 ldxr x0, [x1] 40e580: aa100011 orr x17, x0, x16 40e584: c80ffc31 stlxr w15, x17, [x1] 40e588: 35ffffaf cbnz w15, 40e57c <__aarch64_ldset8_sync+0x1c> the guest will hang as the local exclusive monitor is reset by eret, i.e. the stlxr will always fail due to the debug exception taken to EL2. Link: https://lore.kernel.org/all/20221006003409.649993-8-seanjc@google.com Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221117002350.2178351-3-seanjc@google.com Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
2022-11-16	KVM: arm64: selftests: Disable single-step with correct KVM define	Sean Christopherson
	Disable single-step by setting debug.control to KVM_GUESTDBG_ENABLE, not to SINGLE_STEP_DISABLE. The latter is an arbitrary test enum that just happens to have the same value as KVM_GUESTDBG_ENABLE, and so effectively disables single-step debug. No functional change intended. Cc: Reiji Watanabe <reijiw@google.com> Fixes: b18e4d4aebdd ("KVM: arm64: selftests: Add a test case for KVM_GUESTDBG_SINGLESTEP") Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221117002350.2178351-2-seanjc@google.com Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
2022-11-16	KVM: selftests: Rename perf_test_util symbols to memstress	David Matlack
	Replace the perf_test_ prefix on symbol names with memstress_ to match the new file name. "memstress" better describes the functionality proveded by this library, which is to provide functionality for creating and running a VM that stresses VM memory by reading and writing to guest memory on all vCPUs in parallel. "memstress" also contains the same number of chracters as "perf_test", making it a drop-in replacement in symbols, e.g. function names, without impacting line lengths. Also the lack of underscore between "mem" and "stress" makes it clear "memstress" is a noun. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221012165729.3505266-4-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Rename pta (short for perf_test_args) to args	David Matlack
	Rename the local variables "pta" (which is short for perf_test_args) for args. "pta" is not an obvious acronym and using "args" mirrors "vcpu_args". Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221012165729.3505266-3-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Rename perf_test_util.[ch] to memstress.[ch]	David Matlack
	Rename the perf_test_util.[ch] files to memstress.[ch]. Symbols are renamed in the following commit to reduce the amount of churn here in hopes of playiing nice with git's file rename detection. The name "memstress" was chosen to better describe the functionality proveded by this library, which is to create and run a VM that reads/writes to guest memory on all vCPUs in parallel. "memstress" also contains the same number of chracters as "perf_test", making it a drop-in replacement in symbols, e.g. function names, without impacting line lengths. Also the lack of underscore between "mem" and "stress" makes it clear "memstress" is a noun. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221012165729.3505266-2-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: randomize page access order	Colton Lewis
	Create the ability to randomize page access order with the -a argument. This includes the possibility that the same pages may be hit multiple times during an iteration or not at all. Population has random access as false to ensure all pages will be touched by population and avoid page faults in late dirty memory that would pollute the test results. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221107182208.479157-5-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: randomize which pages are written vs read	Colton Lewis
	Randomize which pages are written vs read using the random number generator. Change the variable wr_fract and associated function calls to write_percent that now operates as a percentage from 0 to 100 where X means each page has an X% chance of being written. Change the -f argument to -w to reflect the new variable semantics. Keep the same default of 100% writes. Population always uses 100% writes to ensure all memory is actually populated and not just mapped to the zero page. The prevents expensive copy-on-write faults from occurring during the dirty memory iterations below, which would pollute the performance results. Each vCPU calculates its own random seed by adding its index to the seed provided. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221107182208.479157-4-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: create -r argument to specify random seed	Colton Lewis
	Create a -r argument to specify a random seed. If no argument is provided, the seed defaults to 1. The random seed is set with perf_test_set_random_seed() and must be set before guest_code runs to apply. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221107182208.479157-3-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: implement random number generator for guest code	Colton Lewis
	Implement random number generator for guest code to randomize parts of the test, making it less predictable and a more accurate reflection of reality. The random number generator chosen is the Park-Miller Linear Congruential Generator, a fancy name for a basic and well-understood random number generator entirely sufficient for this purpose. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221107182208.479157-2-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Allowing running dirty_log_perf_test on specific CPUs	Vipin Sharma
	Add a command line option, -c, to pin vCPUs to physical CPUs (pCPUs), i.e. to force vCPUs to run on specific pCPUs. Requirement to implement this feature came in discussion on the patch "Make page tables for eager page splitting NUMA aware" https://lore.kernel.org/lkml/YuhPT2drgqL+osLl@google.com/ This feature is useful as it provides a way to analyze performance based on the vCPUs and dirty log worker locations, like on the different NUMA nodes or on the same NUMA nodes. To keep things simple, implementation is intentionally very limited, either all of the vCPUs will be pinned followed by an optional main thread or nothing will be pinned. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-8-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Add atoi_positive() and atoi_non_negative() for input validation	Vipin Sharma
	Many KVM selftests take command line arguments which are supposed to be positive (>0) or non-negative (>=0). Some tests do these validation and some missed adding the check. Add atoi_positive() and atoi_non_negative() to validate inputs in selftests before proceeding to use those values. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-7-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Shorten the test args in memslot_modification_stress_test.c	Vipin Sharma
	Change test args memslot_modification_delay and nr_memslot_modifications to delay and nr_iterations for simplicity. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-6-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Use SZ_* macros from sizes.h in max_guest_memory_test.c	Vipin Sharma
	Replace size_1gb defined in max_guest_memory_test.c with the SZ_1G, SZ_2G and SZ_4G from linux/sizes.h header file. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-5-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Add atoi_paranoid() to catch errors missed by atoi()	Vipin Sharma
	atoi() doesn't detect errors. There is no way to know that a 0 return is correct conversion or due to an error. Introduce atoi_paranoid() to detect errors and provide correct conversion. Replace all atoi() calls with atoi_paranoid(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: David Matlack <dmatlack@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-4-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Put command line options in alphabetical order in ↵	Vipin Sharma
	dirty_log_perf_test There are 13 command line options and they are not in any order. Put them in alphabetical order to make it easy to add new options. No functional change intended. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-3-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-16	KVM: selftests: Add missing break between -e and -g option in ↵	Vipin Sharma
	dirty_log_perf_test Passing -e option (Run VCPUs while dirty logging is being disabled) in dirty_log_perf_test also unintentionally enables -g (Do not enable KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2). Add break between two switch case logic. Fixes: cfe12e64b065 ("KVM: selftests: Add an option to run vCPUs while disabling dirty logging") Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-2-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-11-12	KVM: Push dirty information unconditionally to backup bitmap	Gavin Shan
	In mark_page_dirty_in_slot(), we bail out when no running vcpu exists and a running vcpu context is strictly required by architecture. It may cause backwards compatible issue. Currently, saving vgic/its tables is the only known case where no running vcpu context is expected. We may have other unknown cases where no running vcpu context exists and it's reported by the warning message and we bail out without pushing the dirty information to the backup bitmap. For this, the application is going to enable the backup bitmap for the unknown cases. However, the dirty information can't be pushed to the backup bitmap even though the backup bitmap is enabled for those unknown cases in the application, until the unknown cases are added to the allowed list of non-running vcpu context with extra code changes to the host kernel. In order to make the new application, where the backup bitmap has been enabled, to work with the unchanged host, we continue to push the dirty information to the backup bitmap instead of bailing out early. With the added check on 'memslot->dirty_bitmap' to mark_page_dirty_in_slot(), the kernel crash is avoided silently by the combined conditions: no running vcpu context, kvm_arch_allow_write_without_running_vcpu() returns 'true', and the backup bitmap (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) isn't enabled yet. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221112094322.21911-1-gshan@redhat.com
2022-11-11	KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()	Will Deacon
	As a stepping stone towards deprivileging the host's access to the guest's vCPU structures, introduce some naive flush/sync routines to copy most of the host vCPU into the hyp vCPU on vCPU run and back again on return to EL1. This allows us to run using the pKVM hyp structures when KVM is initialised in protected mode. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-27-will@kernel.org
2022-11-11	KVM: arm64: Don't unnecessarily map host kernel sections at EL2	Quentin Perret
	We no longer need to map the host's '.rodata' and '.bss' sections in the stage-1 page-table of the pKVM hypervisor at EL2, so remove those mappings and avoid creating any future dependencies at EL2 on host-controlled data structures. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-25-will@kernel.org
2022-11-11	KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2	Quentin Perret
	The pkvm hypervisor at EL2 may need to read the 'kvm_vgic_global_state' variable from the host, for example when saving and restoring the state of the virtual GIC. Explicitly map 'kvm_vgic_global_state' in the stage-1 page-table of the pKVM hypervisor rather than relying on mapping all of the host '.rodata' section. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-24-will@kernel.org
2022-11-11	KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2	Will Deacon
	Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to modify the variable arbitrarily, potentially leading to all sorts of shenanians as this is used to configure the VTTBR register for the guest stage-2. In preparation for unmapping host sections entirely from EL2, maintain a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it from the host value while it is still trusted. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-23-will@kernel.org
2022-11-11	KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host	Quentin Perret
	When pKVM is enabled, the hypervisor at EL2 does not trust the host at EL1 and must therefore prevent it from having unrestricted access to internal hypervisor state. The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor per-cpu allocations, so move this this into the nVHE code where it cannot be modified by the untrusted host at EL1. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-22-will@kernel.org
2022-11-11	KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache	Quentin Perret
	Rather than relying on the host to free the previously-donated pKVM hypervisor VM pages explicitly on teardown, introduce a dedicated teardown memcache which allows the host to reclaim guest memory resources without having to keep track of all of the allocations made by the pKVM hypervisor at EL2. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> [maz: dropped __maybe_unused from unmap_donated_memory_noclear()] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-21-will@kernel.org
2022-11-11	KVM: arm64: Instantiate guest stage-2 page-tables at EL2	Quentin Perret
	Extend the initialisation of guest data structures within the pKVM hypervisor at EL2 so that we instantiate a memory pool and a full 'struct kvm_s2_mmu' structure for each VM, with a stage-2 page-table entirely independent from the one managed by the host at EL1. The 'struct kvm_pgtable_mm_ops' used by the page-table code is populated with a set of callbacks that can manage guest pages in the hypervisor without any direct intervention from the host, allocating page-table pages from the provided pool and returning these to the host on VM teardown. To keep things simple, the stage-2 MMU for the guest is configured identically to the host stage-2 in the VTCR register and so the IPA size of the guest must match the PA size of the host. For now, the new page-table is unused as there is no way for the host to map anything into it. Yet. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-20-will@kernel.org
2022-11-11	KVM: arm64: Consolidate stage-2 initialisation into a single function	Quentin Perret
	The initialisation of guest stage-2 page-tables is currently split across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2(). That is presumably for historical reasons as kvm_arm_setup_stage2() originates from the (now defunct) KVM port for 32-bit Arm. Simplify this code path by merging both functions into one, taking care to map the 'struct kvm' into the hypervisor stage-1 early on in order to simplify the failure path. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-19-will@kernel.org
2022-11-11	KVM: arm64: Add generic hyp_memcache helpers	Quentin Perret
	The host at EL1 and the pKVM hypervisor at EL2 will soon need to exchange memory pages dynamically for creating and destroying VM state. Indeed, the hypervisor will rely on the host to donate memory pages it can use to create guest stage-2 page-tables and to store VM and vCPU metadata. In order to ease this process, introduce a 'struct hyp_memcache' which is essentially a linked list of available pages, indexed by physical addresses so that it can be passed meaningfully between the different virtual address spaces configured at EL1 and EL2. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-18-will@kernel.org
2022-11-11	KVM: arm64: Provide I-cache invalidation by virtual address at EL2	Will Deacon
	In preparation for handling cache maintenance of guest pages from within the pKVM hypervisor at EL2, introduce an EL2 copy of icache_inval_pou() which will later be plumbed into the stage-2 page-table cache maintenance callbacks, ensuring that the initial contents of pages mapped as executable into the guest stage-2 page-table is visible to the instruction fetcher. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-17-will@kernel.org
2022-11-11	KVM: arm64: Initialise hypervisor copies of host symbols unconditionally	Will Deacon
	The nVHE object at EL2 maintains its own copies of some host variables so that, when pKVM is enabled, the host cannot directly modify the hypervisor state. When running in normal nVHE mode, however, these variables are still mirrored at EL2 but are not initialised. Initialise the hypervisor symbols from the host copies regardless of pKVM, ensuring that any reference to this data at EL2 with normal nVHE will return a sensibly initialised value. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-16-will@kernel.org
2022-11-11	KVM: arm64: Add per-cpu fixmap infrastructure at EL2	Quentin Perret
	Mapping pages in a guest page-table from within the pKVM hypervisor at EL2 may require cache maintenance to ensure that the initialised page contents is visible even to non-cacheable (e.g. MMU-off) accesses from the guest. In preparation for performing this maintenance at EL2, introduce a per-vCPU fixmap which allows the pKVM hypervisor to map guest pages temporarily into its stage-1 page-table for the purposes of cache maintenance and, in future, poisoning on the reclaim path. The use of a fixmap avoids the need for memory allocation or locking on the map() path. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Co-developed-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-15-will@kernel.org
2022-11-11	KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1	Fuad Tabba
	With the pKVM hypervisor at EL2 now offering hypercalls to the host for creating and destroying VM and vCPU structures, plumb these in to the existing arm64 KVM backend to ensure that the hypervisor data structures are allocated and initialised on first vCPU run for a pKVM guest. In the host, 'struct kvm_protected_vm' is introduced to hold the handle of the pKVM VM instance as well as to track references to the memory donated to the hypervisor so that it can be freed back to the host allocator following VM teardown. The stage-2 page-table, hypervisor VM and vCPU structures are allocated separately so as to avoid the need for a large physically-contiguous allocation in the host at run-time. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-14-will@kernel.org
2022-11-11	KVM: arm64: Add infrastructure to create and track pKVM instances at EL2	Fuad Tabba
	Introduce a global table (and lock) to track pKVM instances at EL2, and provide hypercalls that can be used by the untrusted host to create and destroy pKVM VMs and their vCPUs. pKVM VM/vCPU state is directly accessible only by the trusted hypervisor (EL2). Each pKVM VM is directly associated with an untrusted host KVM instance, and is referenced by the host using an opaque handle. Future patches will provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU state using the opaque handle. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Co-developed-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> [maz: silence warning on unmap_donated_memory_noclear()] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-13-will@kernel.org
2022-11-11	KVM: arm64: Rename 'host_kvm' to 'host_mmu'	Will Deacon
	In preparation for introducing VM and vCPU state at EL2, rename the existing 'struct host_kvm' and its singleton 'host_kvm' instance to 'host_mmu' so as to avoid confusion between the structure tracking the host stage-2 MMU state and the host instance of a 'struct kvm' for a protected guest. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-12-will@kernel.org
2022-11-11	KVM: arm64: Add hyp_spinlock_t static initializer	Fuad Tabba
	Introduce a static initializer macro for 'hyp_spinlock_t' so that it is straightforward to instantiate global locks at EL2. This will be later utilised for locking the VM table in the hypervisor. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-11-will@kernel.org
2022-11-11	KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h	Will Deacon
	nvhe/mem_protect.h refers to __load_stage2() in the definition of __load_host_stage2() but doesn't include the relevant header. Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter don't have to do this themselves. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-10-will@kernel.org
2022-11-11	KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2	Quentin Perret
	Add helpers allowing the hypervisor to check whether a range of pages are currently shared by the host, and 'pin' them if so by blocking host unshare operations until the memory has been unpinned. This will allow the hypervisor to take references on host-provided data-structures (e.g. 'struct kvm') with the guarantee that these pages will remain in a stable state until the hypervisor decides to release them, for example during guest teardown. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-9-will@kernel.org
2022-11-11	KVM: arm64: Prevent the donation of no-map pages	Quentin Perret
	Memory regions marked as "no-map" in the host device-tree routinely include TrustZone carev-outs and DMA pools. Although donating such pages to the hypervisor may not breach confidentiality, it could be used to corrupt its state in uncontrollable ways. To prevent this, let's block host-initiated memory transitions targeting "no-map" pages altogether in nVHE protected mode as there should be no valid reason to do this in current operation. Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list of memblock regions, so we can easily check for the presence of the MEMBLOCK_NOMAP flag on a region containing pages being donated from the host. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-8-will@kernel.org
2022-11-11	KVM: arm64: Implement do_donate() helper for donating memory	Will Deacon
	Transferring ownership information of a memory region from one component to another can be achieved using a "donate" operation, which results in the previous owner losing access to the underlying pages entirely and the new owner having exclusive access to the page. Implement a do_donate() helper, along the same lines as do_{un,}share, and provide this functionality for the host-{to,from}-hyp cases as this will later be used to donate/reclaim memory pages to store VM metadata at EL2. In a similar manner to the sharing transitions, permission checks are performed by the hypervisor to ensure that the component initiating the transition really is the owner of the page and also that the completer does not currently have a page mapped at the target address. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Quentin Perret <qperret@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-7-will@kernel.org
2022-11-11	KVM: arm64: Unify identifiers used to distinguish host and hypervisor	Will Deacon
	The 'pkvm_component_id' enum type provides constants to refer to the host and the hypervisor, yet this information is duplicated by the 'pkvm_hyp_id' constant. Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id' type definition to 'mem_protect.h' so that it can be used outside of the memory protection code, for example when initialising the owner for hypervisor-owned pages. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-6-will@kernel.org
2022-11-11	KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2	Quentin Perret
	In order to allow unmapping arbitrary memory pages from the hypervisor stage-1 page-table, fix-up the initial refcount for pages that have been mapped before the 'vmemmap' array was up and running so that it accurately accounts for all existing hypervisor mappings. This is achieved by traversing the entire hypervisor stage-1 page-table during initialisation of EL2 and updating the corresponding 'struct hyp_page' for each valid mapping. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-5-will@kernel.org