linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-09-17	Merge tag 'kvm-x86-pat_vmx_msrs-6.12' of https://github.com/kvm-x86/linux ↵	Paolo Bonzini
	into HEAD KVM VMX and x86 PAT MSR macro cleanup for 6.12: - Add common defines for the x86 architectural memory types, i.e. the types that are shared across PAT, MTRRs, VMCSes, and EPTPs. - Clean up the various VMX MSR macros to make the code self-documenting (inasmuch as possible), and to make it less painful to add new macros.
2024-09-17	Merge tag 'kvm-x86-mmu-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 MMU changes for 6.12: - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop. - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU. - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding MGLRU support in KVM. - Misc cleanups
2024-09-17	Merge tag 'kvm-x86-selftests-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM selftests changes for 6.12: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM. - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest. - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries. - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath. - Misc cleanups
2024-09-17	Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.12 - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon). - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs. This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work). - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value a the ICR offset. - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler. - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit. - Finally fix the RSM vs. nested VM-Enter WARN by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (architecturally, the SHUTDOWN is supposed to hit L1, not L2).
2024-09-17	Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVK generic changes for 6.12: - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase. - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs.
2024-09-17	Merge branch 'kvm-redo-enable-virt' into HEAD	Paolo Bonzini
	Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed. The primary motivation for this series is to simplify dealing with enabling virtualization for Intel's TDX, which needs to enable virtualization when kvm-intel.ko is loaded, i.e. long before the first VM is created. That said, this is a nice cleanup on its own. By registering the callbacks on-demand, the callbacks themselves don't need to check kvm_usage_count, because their very existence implies a non-zero count. Patch 1 (re)adds a dedicated lock for kvm_usage_count. This avoids a lock ordering issue between cpus_read_lock() and kvm_lock. The lock ordering issue still exist in very rare cases, and will be fixed for good by switching vm_list to an (S)RCU-protected list. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-17	Merge branch 'kvm-memslot-zap-quirk' into HEAD	Paolo Bonzini
	Today whenever a memslot is moved or deleted, KVM invalidates the entire page tables and generates fresh ones based on the new memslot layout. This behavior traditionally was kept because of a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs. It generally does not have a huge overhead, because the old MMU is able to reuse cached page tables and the new one is more scalabale and can resolve EPT violations/nested page faults in parallel, but it has worse performance if the guest frequently deletes and adds small memslots, and it's entirely not viable for TDX. This is because TDX requires re-accepting of private pages after page dropping. For non-TDX VMs, this series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk, enabling users to control the behavior of memslot zapping when a memslot is moved/deleted. The quirk is turned on by default, leading to the zapping of all SPTEs when a memslot is moved/deleted; users however have the option to turn off the quirk, which limits the zapping only to those SPTEs hat lie within the range of memslot being moved/deleted. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-17	Merge tag 'kvm-s390-next-6.12-1' of ↵	Paolo Bonzini
	https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * New ucontrol selftest * Inline assembly touchups
2024-09-16	s390: Enable KVM_S390_UCONTROL config in debug_defconfig	Christoph Schlameuss
	To simplify testing enable UCONTROL KVM by default in debug kernels. Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20240807154512.316936-11-schlameuss@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20240807154512.316936-11-schlameuss@linux.ibm.com>
2024-09-16	selftests: kvm: s390: Add VM run test case	Christoph Schlameuss
	Add test case running code interacting with registers within a ucontrol VM. * Add uc_gprs test case The test uses the same VM setup using the fixture and debug macros introduced in earlier patches in this series. Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20240807154512.316936-7-schlameuss@linux.ibm.com [frankja@linux.ibm.com: Removed leftover comment line] Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20240807154512.316936-7-schlameuss@linux.ibm.com>
2024-09-15	Merge tag 'kvm-riscv-6.12-1' of https://github.com/kvm-riscv/linux into HEAD	Paolo Bonzini
	KVM/riscv changes for 6.12 - Fix sbiret init before forwarding to userspace - Don't zero-out PMU snapshot area before freeing data - Allow legacy PMU access from guest - Fix to allow hpmcounter31 from the guest
2024-09-15	Merge tag 'loongarch-kvm-6.12' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.12 1. Revert qspinlock to test-and-set simple lock on VM. 2. Add Loongson Binary Translation extension support. 3. Add PMU support for guest. 4. Enable paravirt feature control from VMM. 5. Implement function kvm_para_has_feature().
2024-09-15	Merge tag 'kvmarm-6.12' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.12 * New features: - Add a Stage-2 page table dumper, reusing the main ptdump infrastructure, and allowing easier debugging of the our page-table infrastructure - Add FP8 support to the KVM/arm64 floating point handling. - Add NV support for the AT family of instructions, which mostly results in adding a page table walker that deals with most of the complexity of the architecture. * Improvements, fixes and cleanups: - Add selftest checks for a bunch of timer emulation corner cases - Fix the multiple of cases where KVM/arm64 doesn't correctly handle the guest trying to use a GICv3 that isn't advertised - Remove REG_HIDDEN_USER from the sysreg infrastructure, making things little more simple - Prevent MTE tags being restored by userspace if we are actively logging writes, as that's a recipe for disaster - Correct the refcount on a page that is not considered for MTE tag copying (such as a device) - Relax the synchronisation when walking a page table to split block mappings, moving it at the end the walk, as there is no need to perform it on every store. - Fix boundary check when transfering memory using FFA - Fix pKVM TLB invalidation, only affecting currently out of tree code but worth addressing for peace of mind
2024-09-12	LoongArch: KVM: Implement function kvm_para_has_feature()	Bibo Mao
	Implement function kvm_para_has_feature() to detect supported paravirt features. It can be used by device driver to detect and enable paravirt features, such as the EIOINTC irqchip driver is able to detect feature KVM_FEATURE_VIRT_EXTIOI and do some optimization. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12	LoongArch: KVM: Enable paravirt feature control from VMM	Bibo Mao
	Export kernel paravirt features to user space, so that VMM can control each single paravirt feature. By default paravirt features will be the same with kvm supported features if VMM does not set it. Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set from user space. This feature indicates that the virt EIOINTC can route interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12	LoongArch: KVM: Add PMU support for guest	Song Gao
	On LoongArch, the host and guest have their own PMU CSRs registers and they share PMU hardware resources. A set of PMU CSRs consists of a CTRL register and a CNTR register. We can set which PMU CSRs are used by the guest by writing to the GCFG register [24:26] bits. On KVM side: - Save the host PMU CSRs into structure kvm_context. - If the host supports the PMU feature. - When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs. - When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12	Merge branch kvm-arm64/visibility-cleanups into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/visibility-cleanups: : . : Remove REG_HIDDEN_USER from the sysreg infrastructure, making things : a little more simple. From the cover letter: : : "Since 4d4f52052ba8 ("KVM: arm64: nv: Drop EL12 register traps that are : redirected to VNCR") and the admission that KVM would never be supporting : the original FEAT_NV, REG_HIDDEN_USER only had a few users, all of which : could either be replaced by a more ad-hoc mechanism, or removed altogether." : . KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier KVM: arm64: Simplify visibility handling of AArch32 SPSR_* KVM: arm64: Simplify handling of CNTKCTL_EL12 Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/s2-ptdump into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/s2-ptdump: : . : Stage-2 page table dumper, reusing the main ptdump infrastructure, : courtesy of Sebastian Ene. From the cover letter: : : "This series extends the ptdump support to allow dumping the guest : stage-2 pagetables. When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump : registers the new following files under debugfs: : - /sys/debug/kvm/<guest_id>/stage2_page_tables : - /sys/debug/kvm/<guest_id>/stage2_levels : - /sys/debug/kvm/<guest_id>/ipa_range : : This allows userspace tools (eg. cat) to dump the stage-2 pagetables by : reading the 'stage2_page_tables' file. : [...]" : . KVM: arm64: Register ptdump with debugfs on guest creation arm64: ptdump: Don't override the level when operating on the stage-2 tables arm64: ptdump: Use the ptdump description from a local context arm64: ptdump: Expose the attribute parsing functionality KVM: arm64: Move pagetable definitions to common header Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/nv-at-pan into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/nv-at-pan: : . : Add NV support for the AT family of instructions, which mostly results : in adding a page table walker that deals with most of the complexity : of the architecture. : : From the cover letter: : : "Another task that a hypervisor supporting NV on arm64 has to deal with : is to emulate the AT instruction, because we multiplex all the S1 : translations on a single set of registers, and the guest S2 is never : truly resident on the CPU. : : So given that we lie about page tables, we also have to lie about : translation instructions, hence the emulation. Things are made : complicated by the fact that guest S1 page tables can be swapped out, : and that our shadow S2 is likely to be incomplete. So while using AT : to emulate AT is tempting (and useful), it is not going to always : work, and we thus need a fallback in the shape of a SW S1 walker." : . KVM: arm64: nv: Add support for FEAT_ATS1A KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration KVM: arm64: nv: Add SW walker for AT S1 emulation KVM: arm64: nv: Make ps_to_output_size() generally available KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W} KVM: arm64: nv: Add basic emulation of AT S1E2{R,W} KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W} KVM: arm64: nv: Honor absence of FEAT_PAN2 KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper arm64: Add system register encoding for PSTATE.PAN arm64: Add PAR_EL1 field description arm64: Add missing APTable and TCR_ELx.HPD masks KVM: arm64: Make kvm_at() take an OP_AT_* Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/nested.c
2024-09-12	Merge branch kvm-arm64/selftests-6.12 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/selftests-6.12: : . : KVM/arm64 selftest updates for 6.12 : : - Check for a bunch of timer emulation corner cases (COlton Lewis) : . KVM: arm64: selftests: Add arch_timer_edge_cases selftest KVM: arm64: selftests: Ensure pending interrupts are handled in arch_timer test Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/vgic-sre-traps into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/vgic-sre-traps: : . : Fix the multiple of cases where KVM/arm64 doesn't correctly : handle the guest trying to use a GICv3 that isn't advertised. : : From the cover letter: : : "It recently appeared that, when running on a GICv3-equipped platform : (which is what non-ancient arm64 HW has), not configuring a GICv3 : for the guest could result in less than desirable outcomes. : : We have multiple issues to fix: : : - for registers that always trap (the SGI registers) or that may : trap (the SRE register), we need to check whether a GICv3 has been : instantiated before acting upon the trap. : : - for registers that only conditionally trap, we must actively trap : them even in the absence of a GICv3 being instantiated, and handle : those traps accordingly. : : - finally, ID registers must reflect the absence of a GICv3, so that : we are consistent. : : This series goes through all these requirements. The main complexity : here is to apply a GICv3 configuration on the host in the absence of a : GICv3 in the guest. This is pretty hackish, but I don't have a much : better solution so far. : : As part of making wider use of of the trap bits, we fully define the : trap routing as per the architecture, something that we eventually : need for NV anyway." : . KVM: arm64: selftests: Cope with lack of GICv3 in set_id_regs KVM: arm64: Add selftest checking how the absence of GICv3 is handled KVM: arm64: Unify UNDEF injection helpers KVM: arm64: Make most GICv3 accesses UNDEF if they trap KVM: arm64: Honor guest requested traps in GICv3 emulation KVM: arm64: Add trap routing information for ICH_HCR_EL2 KVM: arm64: Add ICH_HCR_EL2 to the vcpu state KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest KVM: arm64: Add helper for last ditch idreg adjustments KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE KVM: arm64: Force SRE traps when SRE access is not enabled KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps() Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/fpmr into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/fpmr: : . : Add FP8 support to the KVM/arm64 floating point handling. : : This includes new ID registers (ID_AA64PFR2_EL1 ID_AA64FPFR0_EL1) : being made visible to guests, as well as a new confrol register : (FPMR) which gets context-switched. : . KVM: arm64: Expose ID_AA64PFR2_EL1 to userspace and guests KVM: arm64: Enable FP8 support when available and configured KVM: arm64: Expose ID_AA64FPFR0_EL1 as a writable ID reg KVM: arm64: Honor trap routing for FPMR KVM: arm64: Add save/restore support for FPMR KVM: arm64: Move FPMR into the sysreg array KVM: arm64: Add predicate for FPMR support in a VM KVM: arm64: Move SVCR into the sysreg array Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/mmu-misc-6.12 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/mmu-misc-6.12: : . : Various minor MMU improvements and bug-fixes: : : - Prevent MTE tags being restored by userspace if we are actively : logging writes, as that's a recipe for disaster : : - Correct the refcount on a page that is not considered for MTE : tag copying (such as a device) : : - When walking a page table to split blocks, keep the DSB at the end : the walk, as there is no need to perform it on every store. : : - Fix boundary check when transfering memory using FFA : . KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE KVM: arm64: Move data barrier to end of split walk Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11	KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier	Marc Zyngier
	Now that REG_HIDDEN_USER has no direct user anymore, remove it entirely and update all users of sysreg_hidden_user() to call sysreg_hidden() instead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240904082419.1982402-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11	KVM: arm64: Simplify visibility handling of AArch32 SPSR_*	Marc Zyngier
	Since SPSR_* are not associated with any register in the sysreg array, nor do they have .get_user()/.set_user() helpers, they are invisible to userspace with that encoding. Therefore hidden_user_visibility() serves no purpose here, and can be safely removed. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240904082419.1982402-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11	KVM: arm64: Simplify handling of CNTKCTL_EL12	Marc Zyngier
	We go trough a great deal of effort to map CNTKCTL_EL12 to CNTKCTL_EL1 while hidding this mapping from userspace via a special visibility helper. However, it would be far simpler to just provide an accessor doing the mapping job, removing the need for a visibility helper. With that done, we can also remove the EL12_REG() macro which serves no purpose. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240904082419.1982402-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11	LoongArch: KVM: Add vm migration support for LBT registers	Bibo Mao
	Every vcpu has separate LBT registers. And there are four scr registers, one flags and ftop register for LBT extension. When VM migrates, VMM needs to get LBT registers for every vcpu. Here macro KVM_REG_LOONGARCH_LBT is added for new vcpu lbt register type, the following macro is added to get/put LBT registers. KVM_REG_LOONGARCH_LBT_SCR0 KVM_REG_LOONGARCH_LBT_SCR1 KVM_REG_LOONGARCH_LBT_SCR2 KVM_REG_LOONGARCH_LBT_SCR3 KVM_REG_LOONGARCH_LBT_EFLAGS KVM_REG_LOONGARCH_LBT_FTOP Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11	LoongArch: KVM: Add Binary Translation extension support	Bibo Mao
	Loongson Binary Translation (LBT) is used to accelerate binary translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags) and x87 fpu stack pointer (ftop). Like FPU extension, here a lazy enabling method is used for LBT. the LBT context is saved/restored on the vcpu context switch path. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11	LoongArch: KVM: Add VM feature detection function	Bibo Mao
	Loongson SIMD Extension (LSX), Loongson Advanced SIMD Extension (LASX) and Loongson Binary Translation (LBT) features are defined in register CPUCFG2. Two kinds of LSX/LASX/LBT feature detection are added here, one is VCPU feature, and the other is VM feature. VCPU feature dection can only work with VCPU thread itself, and requires VCPU thread is created already. So LSX/LASX/LBT feature detection for VM is added also, it can be done even if VM is not created, and also can be done by any threads besides VCPU threads. Here ioctl command KVM_HAS_DEVICE_ATTR is added for VM, and macro KVM_LOONGARCH_VM_FEAT_CTRL is added to check supported feature. And five sub-features relative with LSX/LASX/LBT are added as following: KVM_LOONGARCH_VM_FEAT_LSX KVM_LOONGARCH_VM_FEAT_LASX KVM_LOONGARCH_VM_FEAT_X86BT KVM_LOONGARCH_VM_FEAT_ARMBT KVM_LOONGARCH_VM_FEAT_MIPSBT Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11	LoongArch: Revert qspinlock to test-and-set simple lock on VM	Bibo Mao
	Similar with x86, when VM is detected, revert to a simple test-and-set lock to avoid the horrors of queue preemption. Tested on 3C5000 Dual-way machine with 32 cores and 2 numa nodes, test case is kcbench on kernel mainline 6.10, the detailed command is "kcbench --src /root/src/linux" Performance on host machine kernel compile time performance impact Original 150.29 seconds With patch 150.19 seconds almost no impact Performance on virtual machine: 1. 1 VM with 32 vCPUs and 2 numa node, numa node pinned kernel compile time performance impact Original 170.87 seconds With patch 171.73 seconds almost no impact 2. 2 VMs, each VM with 32 vCPUs and 2 numa node, numa node pinned kernel compile time performance impact Original 2362.04 seconds With patch 354.73 seconds +565% Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-10	KVM: arm64: Register ptdump with debugfs on guest creation	Sebastian Ene
	While arch/*/mem/ptdump handles the kernel pagetable dumping code, introduce KVM/ptdump to show the guest stage-2 pagetables. The separation is necessary because most of the definitions from the stage-2 pagetable reside in the KVM path and we will be invoking functionality specific to KVM. Introduce the PTDUMP_STAGE2_DEBUGFS config. When a guest is created, register a new file entry under the guest debugfs dir which allows userspace to show the contents of the guest stage-2 pagetables when accessed. [maz: moved function prototypes from kvm_host.h to kvm_mmu.h] Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20240909124721.1672199-6-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-10	arm64: ptdump: Don't override the level when operating on the stage-2 tables	Sebastian Ene
	Ptdump uses the init_mm structure directly to dump the kernel pagetables. When ptdump is called on the stage-2 pagetables, this mm argument is not used. Prevent the level from being overwritten by checking the argument against NULL. Signed-off-by: Sebastian Ene <sebastianene@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240909124721.1672199-5-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-10	arm64: ptdump: Use the ptdump description from a local context	Sebastian Ene
	Rename the attributes description array to allow the parsing method to use the description from a local context. To be able to do this, store a pointer to the description array in the state structure. This will allow for the later introduced callers (stage_2 ptdump) to specify their own page table description format to the ptdump parser. Signed-off-by: Sebastian Ene <sebastianene@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240909124721.1672199-4-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-10	arm64: ptdump: Expose the attribute parsing functionality	Sebastian Ene
	Reuse the descriptor parsing functionality to keep the same output format as the original ptdump code. In order for this to happen, move the state tracking objects into a common header. [maz: Fixed note_page() stub as suggested by Will] Signed-off-by: Sebastian Ene <sebastianene@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240909124721.1672199-3-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-10	KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer	Snehal Koukuntla
	When we share memory through FF-A and the description of the buffers exceeds the size of the mapped buffer, the fragmentation API is used. The fragmentation API allows specifying chunks of descriptors in subsequent FF-A fragment calls and no upper limit has been established for this. The entire memory region transferred is identified by a handle which can be used to reclaim the transferred memory. To be able to reclaim the memory, the description of the buffers has to fit in the ffa_desc_buf. Add a bounds check on the FF-A sharing path to prevent the memory reclaim from failing. Also do_ffa_mem_xfer() does not need __always_inline, except for the BUILD_BUG_ON() aspect, which gets moved to a macro. [maz: fixed the BUILD_BUG_ON() breakage with LLVM, thanks to Wei-Lin Chang for the timely report] Fixes: 634d90cf0ac65 ("KVM: arm64: Handle FFA_MEM_LEND calls from the host") Cc: stable@vger.kernel.org Reviewed-by: Sebastian Ene <sebastianene@google.com> Signed-off-by: Snehal Koukuntla <snehalreddy@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240909180154.3267939-1-snehalreddy@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-10	KVM: arm64: Move pagetable definitions to common header	Sebastian Ene
	In preparation for using the stage-2 definitions in ptdump, move some of these macros in the common header. Signed-off-by: Sebastian Ene <sebastianene@google.com> Link: https://lore.kernel.org/r/20240909124721.1672199-2-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-09	KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent	Sean Christopherson
	Use KVM_PAGES_PER_HPAGE() instead of open coding equivalent logic that is anything but obvious. No functional change intended, and verified by compiling with the below assertions: BUILD_BUG_ON((1UL << KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)) != KVM_PAGES_PER_HPAGE(PG_LEVEL_4K)); BUILD_BUG_ON((1UL << KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) != KVM_PAGES_PER_HPAGE(PG_LEVEL_2M)); BUILD_BUG_ON((1UL << KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) != KVM_PAGES_PER_HPAGE(PG_LEVEL_1G)); Link: https://lore.kernel.org/r/20240809194335.1726916-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals	Sean Christopherson
	Replace all of the open coded '1' literals used to mark a PTE list as having many/multiple entries with a proper define. It's hard enough to read the code with one magic bit, and a future patch to support "locking" a single rmap will add another. No functional change intended. Link: https://lore.kernel.org/r/20240809194335.1726916-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()	Sean Christopherson
	Fold mmu_spte_age() into its sole caller now that aging and testing for young SPTEs is handled in a common location, i.e. doesn't require more helpers. Opportunistically remove the use of mmu_spte_get_lockless(), as mmu_lock is held (for write!), and marking SPTEs for access tracking outside of mmu_lock is unsafe (at least, as written). I.e. using the lockless accessor is quite misleading. No functional change intended. Link: https://lore.kernel.org/r/20240809194335.1726916-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper	Sean Christopherson
	Rework kvm_handle_gfn_range() into an aging-specic helper, kvm_rmap_age_gfn_range(). In addition to purging a bunch of unnecessary boilerplate code, this sets the stage for aging rmap SPTEs outside of mmu_lock. Note, there's a small functional change, as kvm_test_age_gfn() will now return immediately if a young SPTE is found, whereas previously KVM would continue iterating over other levels. Link: https://lore.kernel.org/r/20240809194335.1726916-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed	Sean Christopherson
	Convert kvm_unmap_gfn_range(), which is the helper that zaps rmap SPTEs in response to an mmu_notifier invalidation, to use __kvm_rmap_zap_gfn_range() and feed in range->may_block. In other words, honor NEED_RESCHED by way of cond_resched() when zapping rmaps. This fixes a long-standing issue where KVM could process an absurd number of rmap entries without ever yielding, e.g. if an mmu_notifier fired on a PUD (or larger) range. Opportunistically rename __kvm_zap_rmap() to kvm_zap_rmap(), and drop the old kvm_zap_rmap(). Ideally, the shuffling would be done in a different patch, but that just makes the compiler unhappy, e.g. arch/x86/kvm/mmu/mmu.c:1462:13: error: ‘kvm_zap_rmap’ defined but not used Reported-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240809194335.1726916-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot	Sean Christopherson
	Add a dedicated helper to walk and zap rmaps for a given memslot so that the code can be shared between KVM-initiated zaps and mmu_notifier invalidations. No functional change intended. Link: https://lore.kernel.org/r/20240809194335.1726916-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()	Sean Christopherson
	Add a @can_yield param to __walk_slot_rmaps() to control whether or not dropping mmu_lock and conditionally rescheduling is allowed. This will allow using __walk_slot_rmaps() and thus cond_resched() to handle mmu_notifier invalidations, which usually allow blocking/yielding, but not when invoked by the OOM killer. Link: https://lore.kernel.org/r/20240809194335.1726916-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()	Sean Christopherson
	Move walk_slot_rmaps() and friends up near for_each_slot_rmap_range() so that the walkers can be used to handle mmu_notifier invalidations, and so that similar function has some amount of locality in code. No functional change intended. Link: https://lore.kernel.org/r/20240809194335.1726916-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn	Sean Christopherson
	WARN if KVM gets an MMIO cache hit on a RET_PF_WRITE_PROTECTED fault, as KVM should return RET_PF_WRITE_PROTECTED if and only if there is a memslot, and creating a memslot is supposed to invalidate the MMIO cache by virtue of changing the memslot generation. Keep the code around mainly to provide a convenient location to document why emulated MMIO should be impossible. Suggested-by: Yuan Yao <yuan.yao@linux.intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list	Sean Christopherson
	Explicitly query the list of to-be-zapped shadow pages when checking to see if unprotecting a gfn for retry has succeeded, i.e. if KVM should retry the faulting instruction. Add a comment to explain why the list needs to be checked before zapping, which is the primary motivation for this change. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version	Sean Christopherson
	Fold kvm_mmu_unprotect_page() into kvm_mmu_unprotect_gfn_and_retry() now that all other direct usage is gone. No functional change intended. Link: https://lore.kernel.org/r/20240831001538.336683-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()	Sean Christopherson
	Rename reexecute_instruction() to kvm_unprotect_and_retry_on_failure() to make the intent and purpose of the helper much more obvious. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Update retry protection fields when forcing retry on emulation failure	Sean Christopherson
	When retrying the faulting instruction after emulation failure, refresh the infinite loop protection fields even if no shadow pages were zapped, i.e. avoid hitting an infinite loop even when retrying the instruction as a last-ditch effort to avoid terminating the guest. Link: https://lore.kernel.org/r/20240831001538.336683-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Apply retry protection to "unprotect on failure" path	Sean Christopherson
	Use kvm_mmu_unprotect_gfn_and_retry() in reexecute_instruction() to pick up protection against infinite loops, e.g. if KVM somehow manages to encounter an unsupported instruction and unprotecting the gfn doesn't allow the vCPU to make forward progress. Other than that, the retry-on- failure logic is a functionally equivalent, open coded version of kvm_mmu_unprotect_gfn_and_retry(). Note, the emulation failure path still isn't fully protected, as KVM won't update the retry protection fields if no shadow pages are zapped (but this change is still a step forward). That flaw will be addressed in a future patch. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>