summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/paging_tmpl.h
AgeCommit message (Collapse)Author
2010-08-02KVM: MMU: add missing reserved bits check in speculative pathXiao Guangrong
In the speculative path, we should check guest pte's reserved bits just as the real processor does Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Eliminate redundant temporaries in FNAME(fetch)Avi Kivity
'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no need for them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Validate all gptes during fetch, not just those used for new pagesAvi Kivity
Currently, when we fetch an spte, we only verify that gptes match those that the walker saw if we build new shadow pages for them. However, this misses the following race: vcpu1 vcpu2 walk change gpte walk instantiate sp fetch existing sp Fix by validating every gpte, regardless of whether it is used for building a new sp or not. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Simplify spte fetch() functionAvi Kivity
Partition the function into three sections: - fetching indirect shadow pages (host_level > guest_level) - fetching direct shadow pages (page_level < host_level <= guest_level) - the final spte (page_level == host_level) Instead of the current spaghetti. A slight change from the original code is that we call validate_direct_spte() more often: previously we called it only for gw->level, now we also call it for lower levels. The change should have no effect. [xiao: fix regression caused by validate_direct_spte() called too late] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Add gpte_valid() helperAvi Kivity
Move the code to check whether a gpte has changed since we fetched it into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Add validate_direct_spte() helperAvi Kivity
Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Add drop_large_spte() helperAvi Kivity
To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Add link_shadow_page() helperAvi Kivity
To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Keep going on permission errorAvi Kivity
Real hardware disregards permission errors when computing page fault error code bit 0 (page present). Do the same. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabledAvi Kivity
Bit 4 of the page fault error code is set only if EFER.NX is set. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Introduce drop_spte()Avi Kivity
When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: cleanup FNAME(fetch)() functionsXiao Guangrong
Cleanup this function that we are already get the direct sp's access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: fix direct sp's access corruptedXiao Guangrong
If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: fix conflict access permissions in direct spXiao Guangrong
In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: Remove memory alias supportAvi Kivity
As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: don't mark pte notrap if it's just sync transientXiao Guangrong
If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: cleanup for dirty page judgmentXiao Guangrong
Using wrap function to cleanup page dirty judgment Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: rename 'page' and 'shadow_page' to 'sp'Xiao Guangrong
Rename 'page' and 'shadow_page' to 'sp' to better fit the context Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: Fix unused but set warningsAndi Kleen
No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: calculate correct gfn for small host pages backing large guest pagesLai Jiangshan
In Documentation/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in function FNAME(fetch)(), sp->gfn is incorrect when one of following situations occurred: 1) guest is 32bit paging and the guest PDE maps a 4-MByte page (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant. And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);" is incorrect. 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page (backed by 4k or 2M host pages). So we fix it to suit to the document and suit to the code which requires sp->gfn correct when sp->role.direct=1. We use the goal mapping gfn(gw->gfn) to calculate the base page frame for linear translations, it is simple and easy to be understood. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: Don't allocate gfns page for direct mmu pagesLai Jiangshan
When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuous guest physical memory range (a linear range). So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Update Red Hat copyrightsAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: only update unsync page in invlpg pathXiao Guangrong
Only unsync pages need updated at invlpg time since other shadow pages are write-protected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_pageXiao Guangrong
sp->gfns[] contain unaliased gfns, but gpte might contain pointer to aliased region. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: Fix debug output error in walk_addr()Gui Jianfeng
Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: mark page table dirty when a pte is actually modifiedGui Jianfeng
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Avoid killing userspace through guest SRAO MCE on unmapped pagesHuang Ying
In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-23KVM: MMU: fix conflict access permissions in direct spXiao Guangrong
In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: MMU: cleanup invlpg codeXiao Guangrong
Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: fix for calculating gpa in invlpg codeXiao Guangrong
If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: MMU: Make use of is_large_pte() in walkerGui Jianfeng
Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK bit directly. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: Move sync_page() first pte address calculation out of loopGui Jianfeng
Move first pte address calculation out of loop to save some cycles. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: remove unnecessary NX check in walk_addrXiao Guangrong
After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: MMU: Reinstate pte prefetch on invlpgAvi Kivity
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: MMU: Do not instantiate nontrapping spte on unsync pageAvi Kivity
The update_pte() path currently uses a nontrapping spte when a nonpresent (or nonaccessed) gpte is written. This is fine since at present it is only used on sync pages. However, on an unsync page this will cause an endless fault loop as the guest is under no obligation to invlpg a gpte that transitions from nonpresent to present. Needed for the next patch which reinstates update_pte() on invlpg. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: x86 emulator: fix memory access during x86 emulationGleb Natapov
Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: rename is_writeble_pte() to is_writable_pte()Takuya Yoshikawa
There are two spellings of "writable" in arch/x86/kvm/mmu.c and paging_tmpl.h . This patch renames is_writeble_pte() to is_writable_pte() and makes grepping easy. New name is consistent with the definition of itself: return pte & PT_WRITABLE_MASK; Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-01-25KVM: MMU: bail out pagewalk on kvm_read_guest errorMarcelo Tosatti
Exit the guest pagetable walk loop if reading gpte failed. Otherwise its possible to enter an endless loop processing the previous present pte. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-27KVM: MMU: remove prefault from invlpg handlerMarcelo Tosatti
The invlpg prefault optimization breaks Windows 2008 R2 occasionally. The visible effect is that the invlpg handler instantiates a pte which is, microseconds later, written with a different gfn by another vcpu. The OS could have other mechanisms to prevent a present translation from being used, which the hypervisor is unaware of. While the documentation states that the cpu is at liberty to prefetch tlb entries, it looks like this is not heeded, so remove tlb prefetch from invlpg. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: MMU: update invlpg handler commentMarcelo Tosatti
Large page translations are always synchronized (either in level 3 or level 2), so its not necessary to properly deal with them in the invlpg handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-10-04KVM: MMU: add SPTE_HOST_WRITEABLE flag to the shadow ptesIzik Eidus
this flag notify that the host physical page we are pointing to from the spte is write protected, and therefore we cant change its access to be write unless we run get_user_pages(write = 1). (this is needed for change_pte support in kvm) Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10KVM: MMU: shadow support for 1gb pagesJoerg Roedel
This patch adds support for shadow paging to the 1gb page table code in KVM. With this code the guest can use 1gb pages even if the host does not support them. [ Marcelo: fix shadow page collision on pmd level if a guest 1gb page is mapped with 4kb ptes on host level ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make page walker aware of mapping levelsJoerg Roedel
The page walker may be used with nested paging too when accessing mmio areas. Make it support the additional page-level too. [ Marcelo: fix reserved bit check for 1gb pte ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: make direct mapping paths aware of mapping levelsJoerg Roedel
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: rename is_largepage_backed to mapping_levelJoerg Roedel
With the new name and the corresponding backend changes this function can now support multiple hugepage sizes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: Trace guest pagetable walkerAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Prepare memslot data structures for multiple hugepage sizesJoerg Roedel
[avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: s/shadow_pte/spte/Avi Kivity
We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pteAvi Kivity
Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: Cache pdptrsAvi Kivity
Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>