summaryrefslogtreecommitdiff
path: root/arch/s390/kvm
AgeCommit message (Collapse)Author
2023-09-19s390/ctlreg: add local and system prefix to some functionsHeiko Carstens
Add local and system prefix to some functions to clarify they change control register contents on either the local CPU or the on all CPUs. This results in the following API: Two defines which load and save multiple control registers. The defines correlate with the following C prototypes: void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high); void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high); Two functions which locally set or clear one bit for a specified control register: void local_ctl_set_bit(unsigned int cr, unsigned int bit); void local_ctl_clear_bit(unsigned int cr, unsigned int bit); Two functions which set or clear one bit for a specified control register on all CPUs: void system_ctl_set_bit(unsigned int cr, unsigned int bit); void system_ctl_clear_bit(unsigend int cr, unsigned int bit); Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-08-31Merge tag 'kvm-s390-next-6.6-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya)
2023-08-30s390/airq: remove lsi_mask from airq_structBenjamin Block
Remove the field `lsi_mask` from `struct airq_struct` as it is not utilized for any adapter interrupt, other than setting it to the default value of 0xff. Because nobody is using this functionality, all it does is cost a little bit of time with each delivered adapter interrupt. Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-28Merge tag 's390-6.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add vfio-ap support to pass-through crypto devices to secure execution guests - Add API ordinal 6 support to zcrypt_ep11misc device drive, which is required to handle key generate and key derive (e.g. secure key to protected key) correctly - Add missing secure/has_secure sysfs files for the case where it is not possible to figure where a system has been booted from. Existing user space relies on that these files are always present - Fix DCSS block device driver list corruption, caused by incorrect error handling - Convert virt_to_pfn() and pfn_to_virt() from defines to static inline functions to enforce type checking - Cleanups, improvements, and minor fixes to the kernel mapping setup - Fix various virtual vs physical address confusions - Move pfault code to separate file, since it has nothing to do with regular fault handling - Move s390 documentation to Documentation/arch/ like it has been done for other architectures already - Add HAVE_FUNCTION_GRAPH_RETVAL support - Factor out the s390_hypfs filesystem and add a new config option for it. The filesystem is deprecated and as soon as all users are gone it can be removed some time in the not so near future - Remove support for old CEX2 and CEX3 crypto cards from zcrypt device driver - Add support for user-defined certificates: receive user-defined certificates with a diagnose call and provide them via 'cert_store' keyring to user space - Couple of other small fixes and improvements all over the place * tag 's390-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits) s390/pci: use builtin_misc_device macro to simplify the code s390/vfio-ap: make sure nib is shared KVM: s390: export kvm_s390_pv*_is_protected functions s390/uv: export uv_pin_shared for direct usage s390/vfio-ap: check for TAPQ response codes 0x35 and 0x36 s390/vfio-ap: handle queue state change in progress on reset s390/vfio-ap: use work struct to verify queue reset s390/vfio-ap: store entire AP queue status word with the queue object s390/vfio-ap: remove upper limit on wait for queue reset to complete s390/vfio-ap: allow deconfigured queue to be passed through to a guest s390/vfio-ap: wait for response code 05 to clear on queue reset s390/vfio-ap: clean up irq resources if possible s390/vfio-ap: no need to check the 'E' and 'I' bits in APQSW after TAPQ s390/ipl: refactor deprecated strncpy s390/ipl: fix virtual vs physical address confusion s390/zcrypt_ep11misc: support API ordinal 6 with empty pin-blob s390/paes: fix PKEY_TYPE_EP11_AES handling for secure keyblobs s390/pkey: fix PKEY_TYPE_EP11_AES handling for sysfs attributes s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_VERIFYKEY2 IOCTL s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_KBLOB2PROTK[23] ...
2023-08-28KVM: s390: pv: Allow AP-instructions for pv-guestsSteffen Eiden
Introduces new feature bits and enablement flags for AP and AP IRQ support. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-5-seiden@linux.ibm.com Message-Id: <20230815151415.379760-5-seiden@linux.ibm.com>
2023-08-28KVM: s390: Add UV feature negotiationSteffen Eiden
Add a uv_feature list for pv-guests to the KVM cpu-model. The feature bits 'AP-interpretation for secure guests' and 'AP-interrupt for secure guests' are available. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-4-seiden@linux.ibm.com Message-Id: <20230815151415.379760-4-seiden@linux.ibm.com>
2023-08-28s390/uv: UV feature check utilitySteffen Eiden
Introduces a function to check the existence of an UV feature. Refactor feature bit checks to use the new function. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-3-seiden@linux.ibm.com Message-Id: <20230815151415.379760-3-seiden@linux.ibm.com>
2023-08-28KVM: s390: pv: relax WARN_ONCE condition for destroy fastViktor Mihajlovski
Destroy configuration fast may return with RC 0x104 if there are still bound APQNs in the configuration. The final cleanup will occur with the standard destroy configuration UVC as at this point in time all APQNs have been reset and thus unbound. Therefore, don't warn if RC 0x104 is reported. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-2-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20230815151415.379760-2-seiden@linux.ibm.com>
2023-08-28Merge remote-tracking branch 'vfio-ap' into nextJanosch Frank
The Secure Execution AP support makes it possible for SE VMs to securely use APQNs without a third party being able to snoop IO. VMs first bind to an APQN to securely attach it and granting protected key crypto function access. Afterwards they can associate the APQN which grants them clear key crypto function access. Once bound the APQNs are not accessible to the host until a reset is performed. The vfio-ap patches being merged here provide the base hypervisor Secure Execution / Protected Virtualization AP support. This includes proper handling of APQNs that are securely attached to a SE/PV guest especially regarding resets.
2023-08-28KVM: s390: interrupt: Fix single-stepping keyless mode exitsIlya Leoshkevich
kvm_s390_skey_check_enable() does not emulate any instructions, rather, it clears CPUSTAT_KSS and arranges the instruction that caused the exit (e.g., ISKE, SSKE, RRBE or LPSWE with a keyed PSW) to run again. Therefore, skip the PER check and let the instruction execution happen. Otherwise, a debugger will see two single-step events on the same instruction. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-6-iii@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-08-28KVM: s390: interrupt: Fix single-stepping userspace-emulated instructionsIlya Leoshkevich
Single-stepping a userspace-emulated instruction that generates an interrupt causes GDB to land on the instruction following it instead of the respective interrupt handler. The reason is that after arranging a KVM_EXIT_S390_SIEIC exit, kvm_handle_sie_intercept() calls kvm_s390_handle_per_ifetch_icpt(), which sets KVM_GUESTDBG_EXIT_PENDING. This bit, however, is not processed immediately, but rather persists until the next ioctl(), causing a spurious single-step exit. Fix by clearing this bit in ioctl(). Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-5-iii@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-08-28KVM: s390: interrupt: Fix single-stepping kernel-emulated instructionsIlya Leoshkevich
Single-stepping a kernel-emulated instruction that generates an interrupt causes GDB to land on the instruction following it instead of the respective interrupt handler. The reason is that kvm_handle_sie_intercept(), after injecting the interrupt, also processes the PER event and arranges a KVM_SINGLESTEP exit. The interrupt is not yet delivered, however, so the userspace sees the next instruction. Fix by avoiding the KVM_SINGLESTEP exit when there is a pending interrupt. The next __vcpu_run() loop iteration will arrange a KVM_SINGLESTEP exit after delivering the interrupt. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-4-iii@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-08-28KVM: s390: interrupt: Fix single-stepping into program interrupt handlersIlya Leoshkevich
Currently, after single-stepping an instruction that generates a specification exception, GDB ends up on the instruction immediately following it. The reason is that vcpu_post_run() injects the interrupt and sets KVM_GUESTDBG_EXIT_PENDING, causing a KVM_SINGLESTEP exit. The interrupt is not delivered, however, therefore userspace sees the address of the next instruction. Fix by letting the __vcpu_run() loop go into the next iteration, where vcpu_pre_run() delivers the interrupt and sets KVM_GUESTDBG_EXIT_PENDING. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20230725143857.228626-3-iii@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-08-28KVM: s390: interrupt: Fix single-stepping into interrupt handlersIlya Leoshkevich
After single-stepping an instruction that generates an interrupt, GDB ends up on the second instruction of the respective interrupt handler. The reason is that vcpu_pre_run() manually delivers the interrupt, and then __vcpu_run() runs the first handler instruction using the CPUSTAT_P flag. This causes a KVM_SINGLESTEP exit on the second handler instruction. Fix by delaying the KVM_SINGLESTEP exit until after the manual interrupt delivery. Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-2-iii@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-08-23Merge branch 'vfio-ap' into featuresHeiko Carstens
Tony Krowiak says: =================== This patch series is for the changes required in the vfio_ap device driver to facilitate pass-through of crypto devices to a secure execution guest. In particular, it is critical that no data from the queues passed through to the SE guest is leaked when the guest is destroyed. There are also some new response codes returned from the PQAP(ZAPQ) and PQAP(TAPQ) commands that have been added to the architecture in support of pass-through of crypto devices to SE guests; these need to be accounted for when handling the reset of queues. =================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18KVM: s390: export kvm_s390_pv*_is_protected functionsTony Krowiak
Export the kvm_s390_pv_is_protected and kvm_s390_pv_cpu_is_protected functions so that they can be called from other modules that carry a GPL-compatible license. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-12-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-29KVM: s390: fix sthyi error handlingHeiko Carstens
Commit 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info") added cache handling for store hypervisor info. This also changed the possible return code for sthyi_fill(). Instead of only returning a condition code like the sthyi instruction would do, it can now also return a negative error value (-ENOMEM). handle_styhi() was not changed accordingly. In case of an error, the negative error value would incorrectly injected into the guest PSW. Add proper error handling to prevent this, and update the comment which describes the possible return values of sthyi_fill(). Fixes: 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20230727182939.2050744-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-18KVM: s390: pv: simplify shutdown and fix raceClaudio Imbrenda
Simplify the shutdown of non-protected VMs. There is no need to do complex manipulations of the counter if it was zero. This also fixes a very rare race which caused pages to be torn down from the address space with a non-zero counter even on older machines that don't support the UVC instruction, causing a crash. Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20230705111937.33472-2-imbrenda@linux.ibm.com>
2023-07-06Merge tag 's390-6.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Alexander Gordeev: - Fix virtual vs physical address confusion in vmem_add_range() and vmem_remove_range() functions - Include <linux/io.h> instead of <asm/io.h> and <asm-generic/io.h> throughout s390 code - Make all PSW related defines also available for assembler files. Remove PSW_DEFAULT_KEY define from uapi for that - When adding an undefined symbol the build still succeeds, but userspace crashes trying to execute VDSO, because the symbol is not resolved. Add undefined symbols check to prevent that - Use kvmalloc_array() instead of kzalloc() for allocaton of 256k memory when executing s390 crypto adapter IOCTL - Add -fPIE flag to prevent decompressor misaligned symbol build error with clang - Use .balign instead of .align everywhere. This is a no-op for s390, but with this there no mix in using .align and .balign anymore - Filter out -mno-pic-data-is-text-relative flag when compiling kernel to prevent VDSO build error - Rework entering of DAT-on mode on CPU restart to use PSW_KERNEL_BITS mask directly - Do not retry administrative requests to some s390 crypto cards, since the firmware assumes replay attacks - Remove most of the debug code, which is build in when kernel config option CONFIG_ZCRYPT_DEBUG is enabled - Remove CONFIG_ZCRYPT_MULTIDEVNODES kernel config option and switch off the multiple devices support for the s390 zcrypt device driver - With the conversion to generic entry machine checks are accounted to the current context instead of irq time. As result, the STCKF instruction at the beginning of the machine check handler and the lowcore member are no longer required, therefore remove it - Fix various typos found with codespell - Minor cleanups to CPU-measurement Counter and Sampling Facilities code - Revert patch that removes VMEM_MAX_PHYS macro, since it causes a regression * tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (25 commits) Revert "s390/mm: get rid of VMEM_MAX_PHYS macro" s390/cpum_sf: remove check on CPU being online s390/cpum_sf: handle casts consistently s390/cpum_sf: remove unnecessary debug statement s390/cpum_sf: remove parameter in call to pr_err s390/cpum_sf: simplify function setup_pmu_cpu s390/cpum_cf: remove unneeded debug statements s390/entry: remove mcck clock s390: fix various typos s390/zcrypt: remove ZCRYPT_MULTIDEVNODES kernel config option s390/zcrypt: do not retry administrative requests s390/zcrypt: cleanup some debug code s390/entry: rework entering DAT-on mode on CPU restart s390/mm: fence off VM macros from asm and linker s390: include linux/io.h instead of asm/io.h s390/ptrace: make all psw related defines also available for asm s390/ptrace: remove PSW_DEFAULT_KEY from uapi s390/vdso: filter out mno-pic-data-is-text-relative cflag s390: consistently use .balign instead of .align s390/decompressor: fix misaligned symbol build error ...
2023-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM64: - Eager page splitting optimization for dirty logging, optionally allowing for a VM to avoid the cost of hugepage splitting in the stage-2 fault path. - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with services that live in the Secure world. pKVM intervenes on FF-A calls to guarantee the host doesn't misuse memory donated to the hyp or a pKVM guest. - Support for running the split hypervisor with VHE enabled, known as 'hVHE' mode. This is extremely useful for testing the split hypervisor on VHE-only systems, and paves the way for new use cases that depend on having two TTBRs available at EL2. - Generalized framework for configurable ID registers from userspace. KVM/arm64 currently prevents arbitrary CPU feature set configuration from userspace, but the intent is to relax this limitation and allow userspace to select a feature set consistent with the CPU. - Enable the use of Branch Target Identification (FEAT_BTI) in the hypervisor. - Use a separate set of pointer authentication keys for the hypervisor when running in protected mode, as the host is untrusted at runtime. - Ensure timer IRQs are consistently released in the init failure paths. - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps (FEAT_EVT), as it is a register commonly read from userspace. - Erratum workaround for the upcoming AmpereOne part, which has broken hardware A/D state management. RISC-V: - Redirect AMO load/store misaligned traps to KVM guest - Trap-n-emulate AIA in-kernel irqchip for KVM guest - Svnapot support for KVM Guest s390: - New uvdevice secret API - CMM selftest and fixes - fix racy access to target CPU for diag 9c x86: - Fix missing/incorrect #GP checks on ENCLS - Use standard mmu_notifier hooks for handling APIC access page - Drop now unnecessary TR/TSS load after VM-Exit on AMD - Print more descriptive information about the status of SEV and SEV-ES during module load - Add a test for splitting and reconstituting hugepages during and after dirty logging - Add support for CPU pinning in demand paging test - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way - Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage recovery threads (because nx_huge_pages=off can be toggled at runtime) - Move handling of PAT out of MTRR code and dedup SVM+VMX code - Fix output of PIC poll command emulation when there's an interrupt - Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. - Misc cleanups, fixes and comments Generic: - Miscellaneous bugfixes and cleanups Selftests: - Generate dependency files so that partial rebuilds work as expected" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits) Documentation/process: Add a maintainer handbook for KVM x86 Documentation/process: Add a label for the tip tree handbook's coding style KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index RISC-V: KVM: Remove unneeded semicolon RISC-V: KVM: Allow Svnapot extension for Guest/VM riscv: kvm: define vcpu_sbi_ext_pmu in header RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel emulation of AIA APLIC RISC-V: KVM: Implement device interface for AIA irqchip RISC-V: KVM: Skeletal in-kernel AIA irqchip support RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero RISC-V: KVM: Add APLIC related defines RISC-V: KVM: Add IMSIC related defines RISC-V: KVM: Implement guest external interrupt line management KVM: x86: Remove PRIx* definitions as they are solely for user space s390/uv: Update query for secret-UVCs s390/uv: replace scnprintf with sysfs_emit s390/uvdevice: Add 'Lock Secret Store' UVC ...
2023-07-03s390: fix various typosHeiko Carstens
Fix various typos found with codespell. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03s390: include linux/io.h instead of asm/io.hHeiko Carstens
Include linux/io.h instead of asm/io.h everywhere. linux/io.h includes asm/io.h, so this shouldn't cause any problems. Instead this might help for some randconfig build errors which were reported due to some undefined io related functions. Also move the changed include so it stays grouped together with other includes from the same directory. For ctcm_mpc.c also remove not needed comments (actually questions). Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-16KVM: s390/diag: fix racy access of physical cpu number in diag 9c handlerChristian Borntraeger
We do check for target CPU == -1, but this might change at the time we are going to use it. Hold the physical target CPU in a local variable to avoid out-of-bound accesses to the cpu arrays. Cc: Pierre Morel <pmorel@linux.ibm.com> Fixes: 87e28a15c42c ("KVM: s390: diag9c (directed yield) forwarding") Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-06-16KVM: s390: vsie: fix the length of APCB bitmapPierre Morel
bit_and() uses the count of bits as the woking length. Fix the previous implementation and effectively use the right bitmap size. Fixes: 19fd83a64718 ("KVM: s390: vsie: allow CRYCB FORMAT-1") Fixes: 56019f9aca22 ("KVM: s390: vsie: Allow CRYCB FORMAT-2") Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/kvm/20230511094719.9691-1-pmorel@linux.ibm.com/ Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-06-13KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holesNico Boehr
The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace specifies a start_gfn outside of memslots. This can occur when a VM has multiple memslots with a hole in between: +-----+----------+--------+--------+ | ... | Slot N-1 | <hole> | Slot N | +-----+----------+--------+--------+ ^ ^ ^ ^ | | | | GFN A A+B | | A+B+C | A+B+C+D When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the CMMA values of the first dirty page in Slot N. However, userspace may get a start_gfn of A+B+C+D with a count of 0, hence completely skipping over any dirty pages in slot N. The error is in kvm_s390_next_dirty_cmma(), which assumes gfn_to_memslot_approx() will return the memslot _below_ the specified GFN when the specified GFN lies outside a memslot. In reality it may return either the memslot below or above the specified GFN. When a memslot above the specified GFN is returned this happens: - ofs is calculated, but since the memslot's base_gfn is larger than the specified cur_gfn, ofs will underflow to a huge number. - ofs is passed to find_next_bit(). Since ofs will exceed the memslot's number of pages, the number of pages in the memslot is returned, completely skipping over all bits in the memslot userspace would be interested in. Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is returned (cur_gfn < ms->base_gfn). Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots") Message-Id: <20230324145424.293889-2-nrb@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-06-09mm/gup: remove vmas parameter from get_user_pages_remote()Lorenzo Stoakes
The only instances of get_user_pages_remote() invocations which used the vmas parameter were for a single page which can instead simply look up the VMA directly. In particular:- - __update_ref_ctr() looked up the VMA but did nothing with it so we simply remove it. - __access_remote_vm() was already using vma_lookup() when the original lookup failed so by doing the lookup directly this also de-duplicates the code. We are able to perform these VMA operations as we already hold the mmap_lock in order to be able to call get_user_pages_remote(). As part of this work we add get_user_page_vma_remote() which abstracts the VMA lookup, error handling and decrementing the page reference count should the VMA lookup fail. This forms part of a broader set of patches intended to eliminate the vmas parameter altogether. [akpm@linux-foundation.org: avoid passing NULL to PTR_ERR] Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64) Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390) Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian König <christian.koenig@amd.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-05Merge tag 'kvm-s390-next-6.4-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD For 6.4
2023-05-04KVM: s390: pv: fix asynchronous teardown for small VMsClaudio Imbrenda
On machines without the Destroy Secure Configuration Fast UVC, the topmost level of page tables is set aside and freed asynchronously as last step of the asynchronous teardown. Each gmap has a host_to_guest radix tree mapping host (userspace) addresses (with 1M granularity) to gmap segment table entries (pmds). If a guest is smaller than 2GB, the topmost level of page tables is the segment table (i.e. there are only 2 levels). Replacing it means that the pointers in the host_to_guest mapping would become stale and cause all kinds of nasty issues. This patch fixes the issue by disallowing asynchronous teardown for guests with only 2 levels of page tables. Userspace should (and already does) try using the normal destroy if the asynchronous one fails. Update s390_replace_asce so it refuses to replace segment type ASCEs. This is still needed in case the normal destroy VM fails. Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20230421085036.52511-2-imbrenda@linux.ibm.com>
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-26Merge tag 'kvm-s390-next-6.4-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Minor cleanup: - phys_to_virt conversion - Improvement of VSIE AP management
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-04-20KVM: s390: pci: fix virtual-physical confusion on module unload/loadNico Boehr
When the kvm module is unloaded, zpci_setup_aipb() perists some data in the zpci_aipb structure in s390 pci code. Note that this struct is also passed to firmware in the zpci_set_irq_ctrl() call and thus the GAIT must be a physical address. On module re-insertion, the GAIT is restored from this structure in zpci_reset_aipb(). But it is a physical address, hence this may cause issues when the kvm module is unloaded and loaded again. Fix virtual vs physical address confusion (which currently are the same) by adding the necessary physical-to-virtual-conversion in zpci_reset_aipb(). Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230222155503.43399-1-nrb@linux.ibm.com Message-Id: <20230222155503.43399-1-nrb@linux.ibm.com>
2023-04-20KVM: s390: vsie: clarifications on setting the APCBPierre Morel
The APCB is part of the CRYCB. The calculation of the APCB origin can be done by adding the APCB offset to the CRYCB origin. Current code makes confusing transformations, converting the CRYCB origin to a pointer to calculate the APCB origin. Let's make things simpler and keep the CRYCB origin to make these calculations. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230214122841.13066-2-pmorel@linux.ibm.com Message-Id: <20230214122841.13066-2-pmorel@linux.ibm.com>
2023-04-20KVM: s390: interrupt: fix virtual-physical confusion for next alert GISANico Boehr
We sometimes put a virtual address in next_alert, which should always be a physical address, since it is shared with hardware. This currently works, because virtual and physical addresses are the same. Add phys_to_virt() to resolve the virtual-physical confusion. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230223162236.51569-1-nrb@linux.ibm.com Message-Id: <20230223162236.51569-1-nrb@linux.ibm.com>
2023-04-05kvm: Remove "select SRCU"Paul E. McKenney
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements from the various KVM Kconfig files. Acked-by: Sean Christopherson <seanjc@google.com> (x86) Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <kvm@vger.kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> (arm64) Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Anup Patel <anup@brainfault.org> (riscv) Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390) Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-03-31KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependentAlexey Kardashevskiy
When introduced, IRQFD resampling worked on POWER8 with XICS. However KVM on POWER9 has never implemented it - the compatibility mode code ("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native XIVE mode does not handle INTx in KVM at all. This moved the capability support advertising to platforms and stops advertising it on XIVE, i.e. POWER9 and later. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220504074807.3616813-1-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-28KVM: s390: pv: fix external interruption loop not always detectedNico Boehr
To determine whether the guest has caused an external interruption loop upon code 20 (external interrupt) intercepts, the ext_new_psw needs to be inspected to see whether external interrupts are enabled. Under non-PV, ext_new_psw can simply be taken from guest lowcore. Under PV, KVM can only access the encrypted guest lowcore and hence the ext_new_psw must not be taken from guest lowcore. handle_external_interrupt() incorrectly did that and hence was not able to reliably tell whether an external interruption loop is happening or not. False negatives cause spurious failures of my kvm-unit-test for extint loops[1] under PV. Since code 20 is only caused under PV if and only if the guest's ext_new_psw is enabled for external interrupts, false positive detection of a external interruption loop can not happen. Fix this issue by instead looking at the guest PSW in the state description. Since the PSW swap for external interrupt is done by the ultravisor before the intercept is caused, this reliably tells whether the guest is enabled for external interrupts in the ext_new_psw. Also update the comments to explain better what is happening. [1] https://lore.kernel.org/kvm/20220812062151.1980937-4-nrb@linux.ibm.com/ Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Fixes: 201ae986ead7 ("KVM: s390: protvirt: Implement interrupt injection") Link: https://lore.kernel.org/r/20230213085520.100756-2-nrb@linux.ibm.com Message-Id: <20230213085520.100756-2-nrb@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-03-16KVM: Change return type of kvm_arch_vm_ioctl() to "int"Thomas Huth
All kvm_arch_vm_ioctl() implementations now only deal with "int" types as return values, so we can change the return type of these functions to use "int" instead of "long". Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20230208140105.655814-7-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: s390: Use "int" as return type for kvm_s390_get/set_skeys()Thomas Huth
These two functions only return normal integers, so it does not make sense to declare the return type as "long" here. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-3-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15Merge tag 'kvm-s390-next-6.3-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * Two more V!=R patches * The last part of the cmpxchg patches * A few fixes
2023-02-15Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest
2023-02-08KVM: s390: GISA: sort out physical vs virtual pointers usageNico Boehr
Fix virtual vs physical address confusion (which currently are the same). In chsc_sgib(), do the virtual-physical conversion in the caller since the caller needs to make sure it is a 31-bit address and zero has a special meaning (disassociating the GIB). Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20221107085727.1533792-1-nrb@linux.ibm.com Message-Id: <20221107085727.1533792-1-nrb@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchgJanis Schoetterl-Glausch
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg op. For now, support this op for absolute accesses only. This op can be used, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230206164602.138068-13-scgl@linux.ibm.com Message-Id: <20230206164602.138068-13-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07KVM: s390: Refactor vcpu mem_op functionJanis Schoetterl-Glausch
Remove code duplication with regards to the CHECK_ONLY flag. Decrease the number of indents. No functional change indented. Suggested-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20230206164602.138068-12-scgl@linux.ibm.com Message-Id: <20230206164602.138068-12-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07KVM: s390: Refactor absolute vm mem_op functionJanis Schoetterl-Glausch
Remove code duplication with regards to the CHECK_ONLY flag. Decrease the number of indents. No functional change indented. Suggested-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230206164602.138068-11-scgl@linux.ibm.com Message-Id: <20230206164602.138068-11-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07KVM: s390: Dispatch to implementing function at top level of vm mem_opJanis Schoetterl-Glausch
Instead of having one function covering all mem_op operations, have a function implementing absolute access and dispatch to that function in its caller, based on the operation code. This way additional future operations can be implemented by adding an implementing function without changing existing operations. Suggested-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230206164602.138068-10-scgl@linux.ibm.com Message-Id: <20230206164602.138068-10-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07KVM: s390: Move common code of mem_op functions into functionJanis Schoetterl-Glausch
The vcpu and vm mem_op ioctl implementations share some functionality. Move argument checking into a function and call it from both implementations. This allows code reuse in case of additional future mem_op operations. Suggested-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230206164602.138068-9-scgl@linux.ibm.com Message-Id: <20230206164602.138068-9-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07KVM: s390: disable migration mode when dirty tracking is disabledNico Boehr
Migration mode is a VM attribute which enables tracking of changes in storage attributes (PGSTE). It assumes dirty tracking is enabled on all memslots to keep a dirty bitmap of pages with changed storage attributes. When enabling migration mode, we currently check that dirty tracking is enabled for all memslots. However, userspace can disable dirty tracking without disabling migration mode. Since migration mode is pointless with dirty tracking disabled, disable migration mode whenever userspace disables dirty tracking on any slot. Also update the documentation to clarify that dirty tracking must be enabled when enabling migration mode, which is already enforced by the code in kvm_s390_vm_start_migration(). Also highlight in the documentation for KVM_S390_GET_CMMA_BITS that it can now fail with -EINVAL when dirty tracking is disabled while migration mode is on. Move all the error codes to a table so this stays readable. To disable migration mode, slots_lock should be held, which is taken in kvm_set_memory_region() and thus held in kvm_arch_prepare_memory_region(). Restructure the prepare code a bit so all the sanity checking is done before disabling migration mode. This ensures migration mode isn't disabled when some sanity check fails. Cc: stable@vger.kernel.org Fixes: 190df4a212a7 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode") Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20230127140532.230651-2-nrb@linux.ibm.com Message-Id: <20230127140532.230651-2-nrb@linux.ibm.com> [frankja@linux.ibm.com: fixed commit message typo, moved api.rst error table upwards] Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-01-11KVM: s390: interrupt: use READ_ONCE() before cmpxchg()Heiko Carstens
Use READ_ONCE() before cmpxchg() to prevent that the compiler generates code that fetches the to be compared old value several times from memory. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>