summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-30KVM: x86/mmu: Set Dirty bit for new SPTEs, even if _hardware_ A/D bits are ↵Sean Christopherson
disabled When making a SPTE, set the Dirty bit in the SPTE as appropriate, even if hardware A/D bits are disabled. Only EPT allows A/D bits to be disabled, and for EPT, the bits are software-available (ignored by hardware) when A/D bits are disabled, i.e. it is perfectly legal for KVM to use the Dirty to track dirty pages in software. Link: https://lore.kernel.org/r/20241011021051.1557902-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Dedup logic for detecting TLB flushes on leaf SPTE changesSean Christopherson
Now that the shadow MMU and TDP MMU have identical logic for detecting required TLB flushes when updating SPTEs, move said logic to a helper so that the TDP MMU code can benefit from the comments that are currently exclusive to the shadow MMU. No functional change intended. Link: https://lore.kernel.org/r/20241011021051.1557902-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Stop processing TDP MMU roots for test_age if young SPTE foundSean Christopherson
Return immediately if a young SPTE is found when testing, but not updating, SPTEs. The return value is a boolean, i.e. whether there is one young SPTE or fifty is irrelevant (ignoring the fact that it's impossible for there to be fifty SPTEs, as KVM has a hard limit on the number of valid TDP MMU roots). Link: https://lore.kernel.org/r/20241011021051.1557902-15-seanjc@google.com [sean: use guard(rcu)(), as suggested by Paolo] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Process only valid TDP MMU roots when aging a gfn rangeSean Christopherson
Skip invalid TDP MMU roots when aging a gfn range. There is zero reason to process invalid roots, as they by definition hold stale information. E.g. if a root is invalid because its from a previous memslot generation, in the unlikely event the root has a SPTE for the gfn, then odds are good that the gfn=>hva mapping is different, i.e. doesn't map to the hva that is being aged by the primary MMU. Link: https://lore.kernel.org/r/20241011021051.1557902-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Use Accessed bit even when _hardware_ A/D bits are disabledSean Christopherson
Use the Accessed bit in SPTEs even when A/D bits are disabled in hardware, i.e. propagate accessed information to SPTE.Accessed even when KVM is doing manual tracking by making SPTEs not-present. In addition to eliminating a small amount of code in is_accessed_spte(), this also paves the way for preserving Accessed information when a SPTE is zapped in response to a mmu_notifier PROTECTION event, e.g. if a SPTE is zapped because NUMA balancing kicks in. Note, EPT is the only flavor of paging in which A/D bits are conditionally enabled, and the Accessed (and Dirty) bit is software-available when A/D bits are disabled. Note #2, there are currently no concrete plans to preserve Accessed information. Explorations on that front were the initial catalyst, but the cleanup is the motivation for the actual commit. Link: https://lore.kernel.org/r/20241011021051.1557902-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Set shadow_dirty_mask for EPT even if A/D bits disabledSean Christopherson
Set shadow_dirty_mask to the architectural EPT Dirty bit value even if A/D bits are disabled at the module level, i.e. even if KVM will never enable A/D bits in hardware. Doing so provides consistent behavior for Accessed and Dirty bits, i.e. doesn't leave KVM in a state where it sets shadow_accessed_mask but not shadow_dirty_mask. Functionally, this should be one big nop, as consumption of shadow_dirty_mask is always guarded by a check that hardware A/D bits are enabled. Link: https://lore.kernel.org/r/20241011021051.1557902-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Set shadow_accessed_mask for EPT even if A/D bits disabledSean Christopherson
Now that KVM doesn't use shadow_accessed_mask to detect if hardware A/D bits are enabled, set shadow_accessed_mask for EPT even when A/D bits are disabled in hardware. This will allow using shadow_accessed_mask for software purposes, e.g. to preserve accessed status in a non-present SPTE acros NUMA balancing, if something like that is ever desirable. Link: https://lore.kernel.org/r/20241011021051.1557902-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Add a dedicated flag to track if A/D bits are globally enabledSean Christopherson
Add a dedicated flag to track if KVM has enabled A/D bits at the module level, instead of inferring the state based on whether or not the MMU's shadow_accessed_mask is non-zero. This will allow defining and using shadow_accessed_mask even when A/D bits aren't used by hardware. Link: https://lore.kernel.org/r/20241011021051.1557902-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writableSean Christopherson
Do a remote TLB flush if installing a leaf SPTE overwrites an existing leaf SPTE (with the same target pfn, which is enforced by a BUG() in handle_changed_spte()) and clears the MMU-Writable bit. Since the TDP MMU passes ACC_ALL to make_spte(), i.e. always requests a Writable SPTE, the only scenario in which make_spte() should create a !MMU-Writable SPTE is if the gfn is write-tracked or if KVM is prefetching a SPTE. When write-protecting for write-tracking, KVM must hold mmu_lock for write, i.e. can't race with a vCPU faulting in the SPTE. And when prefetching a SPTE, the TDP MMU takes care to avoid clobbering a shadow-present SPTE, i.e. it should be impossible to replace a MMU-writable SPTE with a !MMU-writable SPTE when handling a TDP MMU fault. Cc: David Matlack <dmatlack@google.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20241011021051.1557902-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Fold mmu_spte_update_no_track() into mmu_spte_update()Sean Christopherson
Fold the guts of mmu_spte_update_no_track() into mmu_spte_update() now that the latter doesn't flush when clearing A/D bits, i.e. now that there is no need to explicitly avoid TLB flushes when aging SPTEs. Opportunistically WARN if mmu_spte_update() requests a TLB flush when aging SPTEs, as aging should never modify a SPTE in such a way that KVM thinks a TLB flush is needed. Link: https://lore.kernel.org/r/20241011021051.1557902-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Drop ignored return value from kvm_tdp_mmu_clear_dirty_slot()Sean Christopherson
Drop the return value from kvm_tdp_mmu_clear_dirty_slot() as its sole caller ignores the result (KVM flushes after clearing dirty logs based on the logs themselves, not based on SPTEs). Cc: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20241011021051.1557902-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Don't flush TLBs when clearing Dirty bit in shadow MMUSean Christopherson
Don't force a TLB flush when an SPTE update in the shadow MMU happens to clear the Dirty bit, as KVM unconditionally flushes TLBs when enabling dirty logging, and when clearing dirty logs, KVM flushes based on its software structures, not the SPTEs. I.e. the flows that care about accurate Dirty bit information already ensure there are no stale TLB entries. Opportunistically drop is_dirty_spte() as mmu_spte_update() was the sole caller. Link: https://lore.kernel.org/r/20241011021051.1557902-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bitSean Christopherson
Don't force a TLB flush if mmu_spte_update() clears the Accessed bit, as access tracking tolerates false negatives, as evidenced by the mmu_notifier hooks that explicitly test and age SPTEs without doing a TLB flush. In practice, this is very nearly a nop. spte_write_protect() and spte_clear_dirty() never clear the Accessed bit. make_spte() always sets the Accessed bit for !prefetch scenarios. FNAME(sync_spte) only sets SPTE if the protection bits are changing, i.e. if a flush will be needed regardless of the Accessed bits. And FNAME(pte_prefetch) sets SPTE if and only if the old SPTE is !PRESENT. That leaves kvm_arch_async_page_ready() as the one path that will generate a !ACCESSED SPTE *and* overwrite a PRESENT SPTE. And that's very arguably a bug, as clobbering a valid SPTE in that case is nonsensical. Tested-by: Alex Bennée <alex.bennee@linaro.org> Link: https://lore.kernel.org/r/20241011021051.1557902-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Fold all of make_spte()'s writable handling into one if-elseSean Christopherson
Now that make_spte() no longer uses a funky goto to bail out for a special case of its unsync handling, combine all of the unsync vs. writable logic into a single if-else statement. No functional change intended. Link: https://lore.kernel.org/r/20241011021051.1557902-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Always set SPTE's dirty bit if it's created as writableSean Christopherson
When creating a SPTE, always set the Dirty bit if the Writable bit is set, i.e. if KVM is creating a writable mapping. If two (or more) vCPUs are racing to install a writable SPTE on a !PRESENT fault, only the "winning" vCPU will create a SPTE with W=1 and D=1, all "losers" will generate a SPTE with W=1 && D=0. As a result, tdp_mmu_map_handle_target_level() will fail to detect that the losing faults are effectively spurious, and will overwrite the D=1 SPTE with a D=0 SPTE. For normal VMs, overwriting a present SPTE is a small performance blip; KVM blasts a remote TLB flush, but otherwise life goes on. For upcoming TDX VMs, overwriting a present SPTE is much more costly, and can even lead to the VM being terminated if KVM isn't careful, e.g. if KVM attempts TDH.MEM.PAGE.AUG because the TDX code doesn't detect that the new SPTE is actually the same as the old SPTE (which would be a bug in its own right). Suggested-by: Sagi Shahar <sagis@google.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20241011021051.1557902-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-30KVM: x86/mmu: Flush remote TLBs iff MMU-writable flag is cleared from RO SPTESean Christopherson
Don't force a remote TLB flush if KVM happens to effectively "refresh" a read-only SPTE that is still MMU-Writable, as KVM allows MMU-Writable SPTEs to have Writable TLB entries, even if the SPTE is !Writable. Remote TLBs need to be flushed only when creating a read-only SPTE for write-tracking, i.e. when installing a !MMU-Writable SPTE. In practice, especially now that KVM doesn't overwrite existing SPTEs when prefetching, KVM will rarely "refresh" a read-only, MMU-Writable SPTE, i.e. this is unlikely to eliminate many, if any, TLB flushes. But, more precisely flushing makes it easier to understand exactly when KVM does and doesn't need to flush. Note, x86 architecturally requires relevant TLB entries to be invalidated on a page fault, i.e. there is no risk of putting a vCPU into an infinite loop of read-only page faults. Cc: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20241011021051.1557902-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-25Merge branch 'kvm-no-struct-page' into HEADPaolo Bonzini
TL;DR: Eliminate KVM's long-standing (and heinous) behavior of essentially guessing which pfns are refcounted pages (see kvm_pfn_to_refcounted_page()). Getting there requires "fixing" arch code that isn't obviously broken. Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to stop marking pages/folios dirty/accessed based solely on the pfn that's stored in KVM's stage-2 page tables. Instead of tracking which SPTEs correspond to refcounted pages, simply remove all of the code that operates on "struct page" based ona the pfn in stage-2 PTEs. This is the back ~40-50% of the series. For x86 in particular, which sets accessed/dirty status when that info would be "lost", e.g. when SPTEs are zapped or KVM clears the dirty flag in a SPTE, foregoing the updates provides very measurable performance improvements for related operations. E.g. when clearing dirty bits as part of dirty logging, and zapping SPTEs to reconstitue huge pages when disabling dirty logging. The front ~40% of the series is cleanups and prep work, and most of it is x86 focused (purely because x86 added the most special cases, *sigh*). E.g. several of the inputs to hva_to_pfn() (and it's myriad wrappers), can be removed by cleaning up and deduplicating x86 code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-25KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"Sean Christopherson
Now that KVM no longer relies on an ugly heuristic to find its struct page references, i.e. now that KVM can't get false positives on VM_MIXEDMAP pfns, remove KVM's hack to elevate the refcount for pfns that happen to have a valid struct page. In addition to removing a long-standing wart in KVM, this allows KVM to map non-refcounted struct page memory into the guest, e.g. for exposing GPU TTM buffers to KVM guests. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-86-seanjc@google.com>
2024-10-25KVM: Drop APIs that manipulate "struct page" via pfnsSean Christopherson
Remove all kvm_{release,set}_pfn_*() APIs now that all users are gone. No functional change intended. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-85-seanjc@google.com>
2024-10-25KVM: arm64: Don't mark "struct page" accessed when making SPTE youngSean Christopherson
Don't mark pages/folios as accessed in the primary MMU when making a SPTE young in KVM's secondary MMU, as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and wasteful. KVM participates in page aging via mmu_notifiers, so there's no need to push "accessed" updates to the primary MMU. Dropping use of kvm_set_pfn_accessed() also paves the way for removing kvm_pfn_to_refcounted_page() and all its users. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-84-seanjc@google.com>
2024-10-25KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEsSean Christopherson
Don't mark pages/folios as accessed in the primary MMU when zapping SPTEs, as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and wasteful. KVM participates in page aging via mmu_notifiers, so there's no need to push "accessed" updates to the primary MMU. And if KVM zaps a SPTe in response to an mmu_notifier, marking it accessed _after_ the primary MMU has decided to zap the page is likely to go unnoticed, i.e. odds are good that, if the page is being zapped for reclaim, the page will be swapped out regardless of whether or not KVM marks the page accessed. Dropping x86's use of kvm_set_pfn_accessed() also paves the way for removing kvm_pfn_to_refcounted_page() and all its users. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-83-seanjc@google.com>
2024-10-25KVM: Make kvm_follow_pfn.refcounted_page a required fieldSean Christopherson
Now that the legacy gfn_to_pfn() APIs are gone, and all callers of hva_to_pfn() pass in a refcounted_page pointer, make it a required field to ensure all future usage in KVM plays nice. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-82-seanjc@google.com>
2024-10-25KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memorySean Christopherson
Use kvm_release_page_dirty() when unpinning guest pages, as the pfn was retrieved via pin_guest_page(), i.e. is guaranteed to be backed by struct page memory. This will allow dropping kvm_release_pfn_dirty() and friends. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-81-seanjc@google.com>
2024-10-25KVM: Drop gfn_to_pfn() APIs now that all users are goneSean Christopherson
Drop gfn_to_pfn() and all its variants now that all users are gone. No functional change intended. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-80-seanjc@google.com>
2024-10-25KVM: PPC: Explicitly require struct page memory for Ultravisor sharingSean Christopherson
Explicitly require "struct page" memory when sharing memory between guest and host via an Ultravisor. Given the number of pfn_to_page() calls in the code, it's safe to assume that KVM already requires that the pfn returned by gfn_to_pfn() is backed by struct page, i.e. this is likely a bug fix, not a reduction in KVM capabilities. Switching to gfn_to_page() will eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page(). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-79-seanjc@google.com>
2024-10-25KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspaceSean Christopherson
Use __gfn_to_page() instead when copying MTE tags between guest and userspace. This will eventually allow removing gfn_to_pfn_prot(), gfn_to_pfn(), kvm_pfn_to_refcounted_page(), and related APIs. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-78-seanjc@google.com>
2024-10-25KVM: Add support for read-only usage of gfn_to_page()Sean Christopherson
Rework gfn_to_page() to support read-only accesses so that it can be used by arm64 to get MTE tags out of guest memory. Opportunistically rewrite the comment to be even more stern about using gfn_to_page(), as there are very few scenarios where requiring a struct page is actually the right thing to do (though there are such scenarios). Add a FIXME to call out that KVM probably should be pinning pages, not just getting pages. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-77-seanjc@google.com>
2024-10-25KVM: Convert gfn_to_page() to use kvm_follow_pfn()Sean Christopherson
Convert gfn_to_page() to the new kvm_follow_pfn() internal API, which will eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page(). Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-76-seanjc@google.com>
2024-10-25KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructionsSean Christopherson
Use kvm_vcpu_map() when patching dcbz in guest memory, as a regular GUP isn't technically sufficient when writing to data in the target pages. As per Documentation/core-api/pin_user_pages.rst: Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() As a happy bonus, using kvm_vcpu_{,un}map() takes care of creating a mapping and marking the page dirty. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-75-seanjc@google.com>
2024-10-25KVM: PPC: Remove extra get_page() to fix page refcount leakSean Christopherson
Don't manually do get_page() when patching dcbz, as gfn_to_page() gifts the caller a reference. I.e. doing get_page() will leak the page due to not putting all references. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-74-seanjc@google.com>
2024-10-25KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guestSean Christopherson
Convert MIPS to kvm_faultin_pfn()+kvm_release_faultin_page(), which are new APIs to consolidate arch code and provide consistent behavior across all KVM architectures. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-73-seanjc@google.com>
2024-10-25KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lockSean Christopherson
Mark pages accessed before dropping mmu_lock when faulting in guest memory so that MIPS can convert to kvm_release_faultin_page() without tripping its lockdep assertion on mmu_lock being held. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-72-seanjc@google.com>
2024-10-25KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault pathSean Christopherson
Mark pages accessed only in the slow page fault path in order to remove an unnecessary user of kvm_pfn_to_refcounted_page(). Marking pages accessed in the primary MMU during KVM page fault handling isn't harmful, but it's largely pointless and likely a waste of a cycles since the primary MMU will call into KVM via mmu_notifiers when aging pages. I.e. KVM participates in a "pull" model, so there's no need to also "push" updates. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-71-seanjc@google.com>
2024-10-25KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault pathSean Christopherson
Mark pages/folios dirty only the slow page fault path, i.e. only when mmu_lock is held and the operation is mmu_notifier-protected, as marking a page/folio dirty after it has been written back can make some filesystems unhappy (backing KVM guests will such filesystem files is uncommon, and the race is minuscule, hence the lack of complaints). See the link below for details. Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-70-seanjc@google.com>
2024-10-25KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guestSean Christopherson
Convert LoongArch to kvm_faultin_pfn()+kvm_release_faultin_page(), which are new APIs to consolidate arch code and provide consistent behavior across all KVM architectures. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-69-seanjc@google.com>
2024-10-25KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lockSean Christopherson
Mark pages accessed before dropping mmu_lock when faulting in guest memory so that LoongArch can convert to kvm_release_faultin_page() without tripping its lockdep assertion on mmu_lock being held. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-68-seanjc@google.com>
2024-10-25KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault pathSean Christopherson
Mark pages accessed only in the slow path, before dropping mmu_lock when faulting in guest memory so that LoongArch can convert to kvm_release_faultin_page() without tripping its lockdep assertion on mmu_lock being held. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-67-seanjc@google.com>
2024-10-25KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault pathSean Christopherson
Mark pages/folios dirty only the slow page fault path, i.e. only when mmu_lock is held and the operation is mmu_notifier-protected, as marking a page/folio dirty after it has been written back can make some filesystems unhappy (backing KVM guests will such filesystem files is uncommon, and the race is minuscule, hence the lack of complaints). See the link below for details. Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-66-seanjc@google.com>
2024-10-25KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PRSean Christopherson
Convert Book3S PR to __kvm_faultin_pfn()+kvm_release_faultin_page(), which are new APIs to consolidate arch code and provide consistent behavior across all KVM architectures. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-65-seanjc@google.com>
2024-10-25KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTESean Christopherson
Mark pages/folios dirty/accessed after installing a PTE, and more specifically after acquiring mmu_lock and checking for an mmu_notifier invalidation. Marking a page/folio dirty after it has been written back can make some filesystems unhappy (backing KVM guests will such filesystem files is uncommon, and the race is minuscule, hence the lack of complaints). See the link below for details. This will also allow converting Book3S to kvm_release_faultin_page(), which requires that mmu_lock be held (for the aforementioned reason). Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-64-seanjc@google.com>
2024-10-25KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page()Sean Christopherson
Drop @kvm_ro from kvmppc_book3s_instantiate_page() as it is now only written, and never read. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-63-seanjc@google.com>
2024-10-25KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s RadixSean Christopherson
Replace Book3s Radix's homebrewed (read: copy+pasted) fault-in logic with __kvm_faultin_pfn(), which functionally does pretty much the exact same thing. Note, when the code was written, KVM indeed didn't do fast GUP without "!atomic && !async", but that has long since changed (KVM tries fast GUP for all writable mappings). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-62-seanjc@google.com>
2024-10-25KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HVSean Christopherson
Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(), which functionally does pretty much the exact same thing. Note, when the code was written, KVM indeed didn't do fast GUP without "!atomic && !async", but that has long since changed (KVM tries fast GUP for all writable mappings). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-61-seanjc@google.com>
2024-10-25KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guestSean Christopherson
Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which are new APIs to consolidate arch code and provide consistent behavior across all KVM architectures. Opportunisticaly fix a s/priort/prior typo in the related comment. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-60-seanjc@google.com>
2024-10-25KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lockSean Christopherson
Mark pages accessed before dropping mmu_lock when faulting in guest memory so that RISC-V can convert to kvm_release_faultin_page() without tripping its lockdep assertion on mmu_lock being held. Marking pages accessed outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_ outside of mmu_lock can make filesystems unhappy (see the link below). Do both under mmu_lock to minimize the chances of doing the wrong thing in the future. Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-59-seanjc@google.com>
2024-10-25KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installedSean Christopherson
Don't mark pages dirty if KVM bails from the page fault handler without installing a stage-2 mapping, i.e. if the page is guaranteed to not be written by the guest. In addition to being a (very) minor fix, this paves the way for converting RISC-V to use kvm_release_faultin_page(). Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-58-seanjc@google.com>
2024-10-25KVM: arm64: Use __kvm_faultin_pfn() to handle memory abortsSean Christopherson
Convert arm64 to use __kvm_faultin_pfn()+kvm_release_faultin_page(). Three down, six to go. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-57-seanjc@google.com>
2024-10-25KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lockSean Christopherson
Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a page/folio dirty after it has been written back can make some filesystems unhappy (backing KVM guests will such filesystem files is uncommon, and the race is minuscule, hence the lack of complaints). While scary sounding, practically speaking the worst case scenario is that KVM would trigger this WARN in filemap_unaccount_folio(): /* * At this point folio must be either written or cleaned by * truncate. Dirty folio here signals a bug and loss of * unwritten data - on ordinary filesystems. * * But it's harmless on in-memory filesystems like tmpfs; and can * occur when a driver which did get_user_pages() sets page dirty * before putting it, while the inode is being finally evicted. * * Below fixes dirty accounting after removing the folio entirely * but leaves the dirty flag set: it has no effect for truncated * folio and anyway will be cleared before returning folio to * buddy allocator. */ if (WARN_ON_ONCE(folio_test_dirty(folio) && mapping_can_writeback(mapping))) folio_account_cleaned(folio, inode_to_wb(mapping->host)); KVM won't actually write memory because the stage-2 mappings are protected by the mmu_notifier, i.e. there is no risk of loss of data, even if the VM were backed by memory that needs writeback. See the link below for additional details. This will also allow converting arm64 to kvm_release_faultin_page(), which requires that mmu_lock be held (for the aforementioned reason). Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-56-seanjc@google.com>
2024-10-25KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faultsSean Christopherson
Convert PPC e500 to use __kvm_faultin_pfn()+kvm_release_faultin_page(), and continue the inexorable march towards the demise of kvm_pfn_to_refcounted_page(). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-55-seanjc@google.com>
2024-10-25KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lockSean Christopherson
Mark pages accessed before dropping mmu_lock when faulting in guest memory so that shadow_map() can convert to kvm_release_faultin_page() without tripping its lockdep assertion on mmu_lock being held. Marking pages accessed outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_ outside of mmu_lock can make filesystems unhappy. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-54-seanjc@google.com>