diff options
Diffstat (limited to 'Documentation/virtual/kvm/locking.txt')
| -rw-r--r-- | Documentation/virtual/kvm/locking.txt | 153 |
1 files changed, 0 insertions, 153 deletions
diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt deleted file mode 100644 index 41b7ac9884b5..000000000000 --- a/Documentation/virtual/kvm/locking.txt +++ /dev/null @@ -1,153 +0,0 @@ -KVM Lock Overview -================= - -1. Acquisition Orders ---------------------- - -(to be written) - -2: Exception ------------- - -Fast page fault: - -Fast page fault is the fast path which fixes the guest page fault out of -the mmu-lock on x86. Currently, the page fault can be fast only if the -shadow page table is present and it is caused by write-protect, that means -we just need change the W bit of the spte. - -What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and -SPTE_MMU_WRITEABLE bit on the spte: -- SPTE_HOST_WRITEABLE means the gfn is writable on host. -- SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when - the gfn is writable on guest mmu and it is not write-protected by shadow - page write-protection. - -On fast page fault path, we will use cmpxchg to atomically set the spte W -bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, this -is safe because whenever changing these bits can be detected by cmpxchg. - -But we need carefully check these cases: -1): The mapping from gfn to pfn -The mapping from gfn to pfn may be changed since we can only ensure the pfn -is not changed during cmpxchg. This is a ABA problem, for example, below case -will happen: - -At the beginning: -gpte = gfn1 -gfn1 is mapped to pfn1 on host -spte is the shadow page table entry corresponding with gpte and -spte = pfn1 - - VCPU 0 VCPU0 -on fast page fault path: - - old_spte = *spte; - pfn1 is swapped out: - spte = 0; - - pfn1 is re-alloced for gfn2. - - gpte is changed to point to - gfn2 by the guest: - spte = pfn1; - - if (cmpxchg(spte, old_spte, old_spte+W) - mark_page_dirty(vcpu->kvm, gfn1) - OOPS!!! - -We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap. - -For direct sp, we can easily avoid it since the spte of direct sp is fixed -to gfn. For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic() -to pin gfn to pfn, because after gfn_to_pfn_atomic(): -- We have held the refcount of pfn that means the pfn can not be freed and - be reused for another gfn. -- The pfn is writable that means it can not be shared between different gfns - by KSM. - -Then, we can ensure the dirty bitmaps is correctly set for a gfn. - -Currently, to simplify the whole things, we disable fast page fault for -indirect shadow page. - -2): Dirty bit tracking -In the origin code, the spte can be fast updated (non-atomically) if the -spte is read-only and the Accessed bit has already been set since the -Accessed bit and Dirty bit can not be lost. - -But it is not true after fast page fault since the spte can be marked -writable between reading spte and updating spte. Like below case: - -At the beginning: -spte.W = 0 -spte.Accessed = 1 - - VCPU 0 VCPU0 -In mmu_spte_clear_track_bits(): - - old_spte = *spte; - - /* 'if' condition is satisfied. */ - if (old_spte.Accssed == 1 && - old_spte.W == 0) - spte = 0ull; - on fast page fault path: - spte.W = 1 - memory write on the spte: - spte.Dirty = 1 - - - else - old_spte = xchg(spte, 0ull) - - - if (old_spte.Accssed == 1) - kvm_set_pfn_accessed(spte.pfn); - if (old_spte.Dirty == 1) - kvm_set_pfn_dirty(spte.pfn); - OOPS!!! - -The Dirty bit is lost in this case. - -In order to avoid this kind of issue, we always treat the spte as "volatile" -if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means, -the spte is always atomicly updated in this case. - -3): flush tlbs due to spte updated -If the spte is updated from writable to readonly, we should flush all TLBs, -otherwise rmap_write_protect will find a read-only spte, even though the -writable spte might be cached on a CPU's TLB. - -As mentioned before, the spte can be updated to writable out of mmu-lock on -fast page fault path, in order to easily audit the path, we see if TLBs need -be flushed caused by this reason in mmu_spte_update() since this is a common -function to update spte (present -> present). - -Since the spte is "volatile" if it can be updated out of mmu-lock, we always -atomicly update the spte, the race caused by fast page fault can be avoided, -See the comments in spte_has_volatile_bits() and mmu_spte_update(). - -3. Reference ------------- - -Name: kvm_lock -Type: raw_spinlock -Arch: any -Protects: - vm_list - - hardware virtualization enable/disable -Comment: 'raw' because hardware enabling/disabling must be atomic /wrt - migration. - -Name: kvm_arch::tsc_write_lock -Type: raw_spinlock -Arch: x86 -Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset} - - tsc offset in vmcb -Comment: 'raw' because updating the tsc offsets must not be preempted. - -Name: kvm->mmu_lock -Type: spinlock_t -Arch: any -Protects: -shadow page/shadow tlb entry -Comment: it is a spinlock since it is used in mmu notifier. |
