summaryrefslogtreecommitdiff
path: root/arch/s390/kvm
AgeCommit message (Collapse)Author
2022-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
2022-04-02KVM: Don't actually set a request when evicting vCPUs for GFN cache invdSean Christopherson
Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-25Merge tag 's390-5.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Raise minimum supported machine generation to z10, which comes with various cleanups and code simplifications (usercopy/spectre mitigation/etc). - Rework extables and get rid of anonymous out-of-line fixups. - Page table helpers cleanup. Add set_pXd()/set_pte() helper functions. Covert pte_val()/pXd_val() macros to functions. - Optimize kretprobe handling by avoiding extra kprobe on __kretprobe_trampoline. - Add support for CEX8 crypto cards. - Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans. - Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT group sections which simplifies kpatch support. - Always use the packed stack layout and extend kernel unwinder tests. - Add sanity checks for ftrace code patching. - Add s390dbf debug log for the vfio_ap device driver. - Various virtual vs physical address confusion fixes. - Various small fixes and improvements all over the code. * tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits) s390/test_unwind: add kretprobe tests s390/kprobes: Avoid additional kprobe in kretprobe handling s390: convert ".insn" encoding to instruction names s390: assume stckf is always present s390/nospec: move to single register thunks s390: raise minimum supported machine generation to z10 s390/uaccess: Add copy_from/to_user_key functions s390/nospec: align and size extern thunks s390/nospec: add an option to use thunk-extern s390/nospec: generate single register thunks if possible s390/pci: make zpci_set_irq()/zpci_clear_irq() static s390: remove unused expoline to BC instructions s390/irq: use assignment instead of cast s390/traps: get rid of magic cast for per code s390/traps: get rid of magic cast for program interruption code s390/signal: fix typo in comments s390/asm-offsets: remove unused defines s390/test_unwind: avoid build warning with W=1 s390: remove .fixup section s390/bpf: encode register within extable entry ...
2022-03-15Merge tag 'kvm-s390-next-5.18-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix, test and feature for 5.18 part 2 - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests
2022-03-14KVM: s390x: fix SCK lockingClaudio Imbrenda
When handling the SCK instruction, the kvm lock is taken, even though the vcpu lock is already being held. The normal locking order is kvm lock first and then vcpu lock. This is can (and in some circumstances does) lead to deadlocks. The function kvm_s390_set_tod_clock is called both by the SCK handler and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu lock, so they can safely take the kvm lock. The SCK handler holds the vcpu lock, but will also somehow need to acquire the kvm lock without relinquishing the vcpu lock. The solution is to factor out the code to set the clock, and provide two wrappers. One is called like the original function and does the locking, the other is called kvm_s390_try_set_tod_clock and uses trylock to try to acquire the kvm lock. This new wrapper is then used in the SCK handler. If locking fails, -EAGAIN is returned, which is eventually propagated to userspace, thus also freeing the vcpu lock and allowing for forward progress. This is not the most efficient or elegant way to solve this issue, but the SCK instruction is deprecated and its performance is not critical. The goal of this patch is just to provide a simple but correct way to fix the bug. Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220301143340.111129-1-imbrenda@linux.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-01KVM: s390: Replace KVM_REQ_MMU_RELOAD usage with arch specific requestSean Christopherson
Add an arch request, KVM_REQ_REFRESH_GUEST_PREFIX, to deal with guest prefix changes instead of piggybacking KVM_REQ_MMU_RELOAD. This will allow for the removal of the generic KVM_REQ_MMU_RELOAD, which isn't actually used by generic KVM. No functional change intended. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220225182248.3812651-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25KVM: s390: pv: make use of ultravisor AIV supportMichael Mueller
This patch enables the ultravisor adapter interruption vitualization support indicated by UV feature BIT_UV_FEAT_AIV. This allows ISC interruption injection directly into the GISA IPM for PV kvm guests. Hardware that does not support this feature will continue to use the UV interruption interception method to deliver ISC interruptions to PV kvm guests. For this purpose, the ECA_AIV bit for all guest cpus will be cleared and the GISA will be disabled during PV CPU setup. In addition a check in __inject_io() has been removed. That reduces the required instructions for interruption handling for PV and traditional kvm guests. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220209152217.1793281-2-mimu@linux.ibm.com Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-22KVM: s390: Add missing vm MEM_OP size checkJanis Schoetterl-Glausch
Check that size is not zero, preventing the following warning: WARNING: CPU: 0 PID: 9692 at mm/vmalloc.c:3059 __vmalloc_node_range+0x528/0x648 Modules linked in: CPU: 0 PID: 9692 Comm: memop Not tainted 5.17.0-rc3-e4+ #80 Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0704c00180000000 0000000082dc584c (__vmalloc_node_range+0x52c/0x648) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000083 ffffffffffffffff 0000000000000000 0000000000000001 0000038000000000 000003ff80000000 0000000000000cc0 000000008ebb8000 0000000087a8a700 000000004040aeb1 000003ffd9f7dec8 000000008ebb8000 000000009d9b8000 000000000102a1b4 00000380035afb68 00000380035afaa8 Krnl Code: 0000000082dc583e: d028a7f4ff80 trtr 2036(41,%r10),3968(%r15) 0000000082dc5844: af000000 mc 0,0 #0000000082dc5848: af000000 mc 0,0 >0000000082dc584c: a7d90000 lghi %r13,0 0000000082dc5850: b904002d lgr %r2,%r13 0000000082dc5854: eb6ff1080004 lmg %r6,%r15,264(%r15) 0000000082dc585a: 07fe bcr 15,%r14 0000000082dc585c: 47000700 bc 0,1792 Call Trace: [<0000000082dc584c>] __vmalloc_node_range+0x52c/0x648 [<0000000082dc5b62>] vmalloc+0x5a/0x68 [<000003ff8067f4ca>] kvm_arch_vm_ioctl+0x2da/0x2a30 [kvm] [<000003ff806705bc>] kvm_vm_ioctl+0x4ec/0x978 [kvm] [<0000000082e562fe>] __s390x_sys_ioctl+0xbe/0x100 [<000000008360a9bc>] __do_syscall+0x1d4/0x200 [<0000000083618bd2>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<0000000082dc5348>] __vmalloc_node_range+0x28/0x648 Other than the warning, there is no ill effect from the missing check, the condition is detected by subsequent code and causes a return with ENOMEM. Fixes: ef11c9463ae0 (KVM: s390: Add vm IOCTL for key checked guest absolute memory access) Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220221163237.4122868-1-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-14KVM: s390: Add capability for storage key extension of MEM_OP IOCTLJanis Schoetterl-Glausch
Availability of the KVM_CAP_S390_MEM_OP_EXTENSION capability signals that: * The vcpu MEM_OP IOCTL supports storage key checking. * The vm MEM_OP IOCTL exists. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-9-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-14KVM: s390: Rename existing vcpu memop functionsJanis Schoetterl-Glausch
Makes the naming consistent, now that we also have a vm ioctl. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-8-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-14KVM: s390: Add vm IOCTL for key checked guest absolute memory accessJanis Schoetterl-Glausch
Channel I/O honors storage keys and is performed on absolute memory. For I/O emulation user space therefore needs to be able to do key checked accesses. The vm IOCTL supports read/write accesses, as well as checking if an access would succeed. Unlike relying on KVM_S390_GET_SKEYS for key checking would, the vm IOCTL performs the check in lockstep with the read or write, by, ultimately, mapping the access to move instructions that support key protection checking with a supplied key. Fetch and storage protection override are not applicable to absolute accesses and so are not applied as they are when using the vcpu memop. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-7-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-14KVM: s390: Add optional storage key checking to MEMOP IOCTLJanis Schoetterl-Glausch
User space needs a mechanism to perform key checked accesses when emulating instructions. The key can be passed as an additional argument. Having an additional argument is flexible, as user space can pass the guest PSW's key, in order to make an access the same way the CPU would, or pass another key if necessary. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-6-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-14KVM: s390: handle_tprot: Honor storage keysJanis Schoetterl-Glausch
Use the access key operand to check for key protection when translating guest addresses. Since the translation code checks for accessing exceptions/error hvas, we can remove the check here and simplify the control flow. Keep checking if the memory is read-only even if such memslots are currently not supported. handle_tprot was the last user of guest_translate_address, so remove it. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-4-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-14KVM: s390: Honor storage keys when accessing guest memoryJanis Schoetterl-Glausch
Storage key checking had not been implemented for instructions emulated by KVM. Implement it by enhancing the functions used for guest access, in particular those making use of access_guest which has been renamed to access_guest_with_key. Accesses via access_guest_real should not be key checked. For actual accesses, key checking is done by copy_from/to_user_key (which internally uses MVCOS/MVCP/MVCS). In cases where accessibility is checked without an actual access, this is performed by getting the storage key and checking if the access key matches. In both cases, if applicable, storage and fetch protection override are honored. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-3-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-06s390: remove invalid email address of Heiko CarstensHeiko Carstens
Remove my old invalid email address which can be found in a couple of files. Instead of updating it, just remove my contact data completely from source files. We have git and other tools which allow to figure out who is responsible for what with recent contact data. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-02KVM: s390: Return error on SIDA memop on normal guestJanis Schoetterl-Glausch
Refuse SIDA memops on guests which are not protected. For normal guests, the secure instruction data address designation, which determines the location we access, is not under control of KVM. Fixes: 19e122776886 (KVM: S390: protvirt: Introduce instruction data area bounce buffer) Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-01-23Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
2022-01-15all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriateYury Norov
find_first{,_zero}_bit is a more effective analogue of 'next' version if start == 0. This patch replaces 'next' with 'first' where things look trivial. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2021-12-21Merge tag 'kvm-s390-next-5.17-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix and cleanup - fix sigp sense/start/stop/inconsistency - cleanups
2021-12-17KVM: s390: Clarify SIGP orders versus STOP/RESTARTEric Farman
With KVM_CAP_S390_USER_SIGP, there are only five Signal Processor orders (CONDITIONAL EMERGENCY SIGNAL, EMERGENCY SIGNAL, EXTERNAL CALL, SENSE, and SENSE RUNNING STATUS) which are intended for frequent use and thus are processed in-kernel. The remainder are sent to userspace with the KVM_CAP_S390_USER_SIGP capability. Of those, three orders (RESTART, STOP, and STOP AND STORE STATUS) have the potential to inject work back into the kernel, and thus are asynchronous. Let's look for those pending IRQs when processing one of the in-kernel SIGP orders, and return BUSY (CC2) if one is in process. This is in agreement with the Principles of Operation, which states that only one order can be "active" on a CPU at a time. Cc: stable@vger.kernel.org Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211213210550.856213-2-farman@linux.ibm.com [borntraeger@linux.ibm.com: add stable tag] Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2021-12-17KVM: s390: gaccess: Cleanup access to guest pagesJanis Schoetterl-Glausch
Introduce a helper function for guest frame access. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20211126164549.7046-4-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2021-12-17KVM: s390: gaccess: Refactor access address range checkJanis Schoetterl-Glausch
Do not round down the first address to the page boundary, just translate it normally, which gives the value we care about in the first place. Given this, translating a single address is just the special case of translating a range spanning a single page. Make the output optional, so the function can be used to just check a range. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20211126164549.7046-3-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2021-12-17KVM: s390: gaccess: Refactor gpa and length calculationJanis Schoetterl-Glausch
Improve readability by renaming the length variable and not calculating the offset manually. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20211126164549.7046-2-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2021-12-09KVM: s390: Use Makefile.kvm for common filesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20211121125451.9489-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()Sean Christopherson
Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting the actual "block" sequences into a separate helper (to be named kvm_vcpu_block()). x86 will use the standalone block-only path to handle non-halt cases where the vCPU is not runnable. Rename block_ns to halt_ns to match the new function name. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Drop obsolete kvm_arch_vcpu_block_finish()Sean Christopherson
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are nops. No functional change intended. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: s390: Clear valid_wakeup in kvm_s390_handle_wait(), not in arch hookSean Christopherson
Move the clearing of valid_wakeup from kvm_arch_vcpu_block_finish() so that a future patch can drop said arch hook. Unlike the other blocking- related arch hooks, vcpu_blocking/unblocking(), vcpu_block_finish() needs to be called even if the KVM doesn't actually block the vCPU. This will allow future patches to differentiate between truly blocking the vCPU and emulating a halt condition without introducing a contradiction. Alternatively, the hook could be renamed to kvm_arch_vcpu_halt_finish(), but there's literally one call site in s390, and future cleanup can also be done to handle valid_wakeup fully within kvm_s390_handle_wait() and allow generic KVM to drop vcpu_valid_wakeup(). No functional change intended. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPUSean Christopherson
Wrap s390's halt_poll_max_steal with READ_ONCE and snapshot the result of kvm_arch_no_poll() in kvm_vcpu_block() to avoid a mostly-theoretical, largely benign bug on s390 where the result of kvm_arch_no_poll() could change due to userspace modifying halt_poll_max_steal while the vCPU is blocking. The bug is largely benign as it will either cause KVM to skip updating halt-polling times (no_poll toggles false=>true) or to update halt-polling times with a slightly flawed block_ns. Note, READ_ONCE is unnecessary in the current code, add it in case the arch hook is ever inlined, and to provide a hint that userspace can change the param at will. Fixes: 8b905d28ee17 ("KVM: s390: provide kvm_arch_no_poll function") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Keep memslots in tree-based structures instead of array-based onesMaciej S. Szmigiero
The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: s390: Introduce kvm_s390_get_gfn_end()Maciej S. Szmigiero
And use it where s390 code would just access the memslot with the highest gfn directly. No functional change intended. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <42496041d6af1c23b1cbba2636b344ca8d5fc3af.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Use interval tree to do fast hva lookup in memslotsMaciej S. Szmigiero
The current memslots implementation only allows quick binary search by gfn, quick lookup by hva is not possible - the implementation has to do a linear scan of the whole memslots array, even though the operation being performed might apply just to a single memslot. This significantly hurts performance of per-hva operations with higher memslot counts. Since hva ranges can overlap between memslots an interval tree is needed for tracking them. [sean: handle interval tree updates in kvm_replace_memslot()] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Integrate gfn_to_memslot_approx() into search_memslots()Maciej S. Szmigiero
s390 arch has gfn_to_memslot_approx() which is almost identical to search_memslots(), differing only in that in case the gfn falls in a hole one of the memslots bordering the hole is returned. Add this lookup mode as an option to search_memslots() so we don't have two almost identical functions for looking up a memslot by its gfn. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [sean: tweaked helper names to keep gfn_to_memslot_approx() in s390] Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <171cd89b52c718dbe180ecd909b4437a64a7e2ec.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: s390: Skip gfn/size sanity checks on memslot DELETE or FLAGS_ONLYSean Christopherson
Sanity check the hva, gfn, and size of a userspace memory region only if any of those properties can change, i.e. skip the checks for DELETE and FLAGS_ONLY. KVM doesn't allow moving the hva or changing the size, a gfn change shows up as a MOVE even if flags are being modified, and the checks are pointless for the DELETE case as userspace_addr and gfn_base are zeroed by common KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <05430738437ac2c9c7371ac4e11f4a533e1677da.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Stop passing kvm_userspace_memory_region to arch memslot hooksSean Christopherson
Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now that its use has been removed in all architectures. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: s390: Use "new" memslot instead of userspace memory regionSean Christopherson
Get the gfn, size, and hva from the new memslot instead of the userspace memory region when preparing/committing memory region changes. This will allow a future commit to drop the @mem param. Note, this has a subtle functional change as KVM would previously reject DELETE if userspace provided a garbage userspace_addr or guest_phys_addr, whereas KVM zeros those fields in the "new" memslot when deleting an existing memslot. Arguably the old behavior is more correct, but there's zero benefit into requiring userspace to provide sane values for hva and gfn. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <917ed131c06a4c7b35dd7fb7ed7955be899ad8cc.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Let/force architectures to deal with arch specific memslot dataSean Christopherson
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch code to handle propagating arch specific data from "new" to "old" when necessary. This is a baby step towards dynamically allocating "new" from the get go, and is a (very) minor performance boost on x86 due to not unnecessarily copying arch data. For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE and FLAGS_ONLY. This is functionally a nop as the previous behavior would overwrite the pointer for CREATE, and eventually discard/ignore it for DELETE. For x86, copy the arch data only for FLAGS_ONLY changes. Unlike PPC HV, x86 needs to reallocate arch data in the MOVE case as the size of x86's allocations depend on the alignment of the memslot's gfn. Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to match the "commit" prototype. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [mss: add missing RISCV kvm_arch_prepare_memory_region() change] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s indexMarc Zyngier
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: s390: Use kvm_get_vcpu() instead of open-coded accessMarc Zyngier
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-4-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Move wiping of the kvm->vcpus array to common codeMarc Zyngier
All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. The locking is dropped altogether, as this should only be called when there is no further references on the kvm structure. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()Vitaly Kuznetsov
KVM_CAP_NR_VCPUS is a legacy advisory value which on other architectures return num_online_cpus() caped by KVM_CAP_NR_VCPUS or something else (ppc and arm64 are special cases). On s390, KVM_CAP_NR_VCPUS returns the same as KVM_CAP_MAX_VCPUS and this may turn out to be a bad 'advice'. Switch s390 to returning caped num_online_cpus() too. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Message-Id: <20211116163443.88707-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-06Merge tag 's390-5.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for ftrace with direct call and ftrace direct call samples. - Add support for kernel command lines longer than current 896 bytes and make its length configurable. - Add support for BEAR enhancement facility to improve last breaking event instruction tracking. - Add kprobes sanity checks and testcases to prevent kprobe in the mid of an instruction. - Allow concurrent access to /dev/hwc for the CPUMF users. - Various ftrace / jump label improvements. - Convert unwinder tests to KUnit. - Add s390_iommu_aperture kernel parameter to tweak the limits on concurrently usable DMA mappings. - Add ap.useirq AP module option which can be used to disable interrupt use. - Add add_disk() error handling support to block device drivers. - Drop arch specific and use generic implementation of strlcpy and strrchr. - Several __pa/__va usages fixes. - Various cio, crypto, pci, kernel doc and other small fixes and improvements all over the code. [ Merge fixup as per https://lore.kernel.org/all/YXAqZ%2FEszRisunQw@osiris/ ] * tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (63 commits) s390: make command line configurable s390: support command lines longer than 896 bytes s390/kexec_file: move kernel image size check s390/pci: add s390_iommu_aperture kernel parameter s390/spinlock: remove incorrect kernel doc indicator s390/string: use generic strlcpy s390/string: use generic strrchr s390/ap: function rework based on compiler warning s390/cio: make ccw_device_dma_* more robust s390/vfio-ap: s390/crypto: fix all kernel-doc warnings s390/hmcdrv: fix kernel doc comments s390/ap: new module option ap.useirq s390/cpumf: Allow multiple processes to access /dev/hwc s390/bitops: return true/false (not 1/0) from bool functions s390: add support for BEAR enhancement facility s390: introduce nospec_uses_trampoline() s390: rename last_break to pgm_last_break s390/ptrace: add last_break member to pt_regs s390/sclp: sort out physical vs virtual pointers usage s390/setup: convert start and end initrd pointers to virtual ...
2021-10-27KVM: s390: add debug statement for diag 318 CPNC dataCollin Walling
The diag 318 data contains values that denote information regarding the guest's environment. Currently, it is unecessarily difficult to observe this value (either manually-inserted debug statements, gdb stepping, mem dumping etc). It's useful to observe this information to obtain an at-a-glance view of the guest's environment, so lets add a simple VCPU event that prints the CPNC to the s390dbf logs. Signed-off-by: Collin Walling <walling@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20211027025451.290124-1-walling@linux.ibm.com [borntraeger@de.ibm.com]: change debug level to 3 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-27KVM: s390: Fix handle_sske page fault handlingJanis Schoetterl-Glausch
If handle_sske cannot set the storage key, because there is no page table entry or no present large page entry, it calls fixup_user_fault. However, currently, if the call succeeds, handle_sske returns -EAGAIN, without having set the storage key. Instead, retry by continue'ing the loop without incrementing the address. The same issue in handle_pfmf was fixed by a11bdb1a6b78 ("KVM: s390: Fix pfmf and conditional skey emulation"). Fixes: bd096f644319 ("KVM: s390: Add skey emulation fault handling") Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211022152648.26536-1-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-26s390: rename last_break to pgm_last_breakSven Schnelle
With the upcoming BEAR enhancements last_break isn't really unique, so rename it to pgm_last_break. This way it should be more obvious that this is the last_break value that is written by the hardware when a program check occurs. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-25KVM: s390: Add a routine for setting userspace CPU stateEric Farman
This capability exists, but we don't record anything when userspace enables it. Let's refactor that code so that a note can be made in the debug logs that it was enabled. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211008203112.1979843-7-farman@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25KVM: s390: Simplify SIGP Set Arch handlingEric Farman
The Principles of Operations describe the various reasons that each individual SIGP orders might be rejected, and the status bit that are set for each condition. For example, for the Set Architecture order, it states: "If it is not true that all other CPUs in the configu- ration are in the stopped or check-stop state, ... bit 54 (incorrect state) ... is set to one." However, it also states: "... if the CZAM facility is installed, ... bit 55 (invalid parameter) ... is set to one." Since the Configuration-z/Architecture-Architectural Mode (CZAM) facility is unconditionally presented, there is no need to examine each VCPU to determine if it is started/stopped. It can simply be rejected outright with the Invalid Parameter bit. Fixes: b697e435aeee ("KVM: s390: Support Configuration z/Architecture Mode") Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20211008203112.1979843-2-farman@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25KVM: s390: pv: avoid stalls when making pages secureClaudio Imbrenda
Improve make_secure_pte to avoid stalls when the system is heavily overcommitted. This was especially problematic in kvm_s390_pv_unpack, because of the loop over all pages that needed unpacking. Due to the locks being held, it was not possible to simply replace uv_call with uv_call_sched. A more complex approach was needed, in which uv_call is replaced with __uv_call, which does not loop. When the UVC needs to be executed again, -EAGAIN is returned, and the caller (or its caller) will try again. When -EAGAIN is returned, the path is the same as when the page is in writeback (and the writeback check is also performed, which is harmless). Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20210920132502.36111-5-imbrenda@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vmClaudio Imbrenda
When the system is heavily overcommitted, kvm_s390_pv_init_vm might generate stall notifications. Fix this by using uv_call_sched instead of just uv_call. This is ok because we are not holding spinlocks. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20210920132502.36111-4-imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25KVM: s390: pv: avoid double free of sida pageClaudio Imbrenda
If kvm_s390_pv_destroy_cpu is called more than once, we risk calling free_page on a random page, since the sidad field is aliased with the gbea, which is not guaranteed to be zero. This can happen, for example, if userspace calls the KVM_PV_DISABLE IOCTL, and it fails, and then userspace calls the same IOCTL again. This scenario is only possible if KVM has some serious bug or if the hardware is broken. The solution is to simply return successfully immediately if the vCPU was already non secure. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: 19e1227768863a1469797c13ef8fea1af7beac2c ("KVM: S390: protvirt: Introduce instruction data area bounce buffer") Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20210920132502.36111-3-imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-20KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpuHalil Pasic
Changing the deliverable mask in __airqs_kick_single_vcpu() is a bug. If one idle vcpu can't take the interrupts we want to deliver, we should look for another vcpu that can, instead of saying that we don't want to deliver these interrupts by clearing the bits from the deliverable_mask. Fixes: 9f30f6216378 ("KVM: s390: add gib_alert_irq_handler()") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211019175401.3757927-3-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>