summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/paging_tmpl.h
AgeCommit message (Collapse)Author
2012-07-11KVM: MMU: fix kvm_mmu_pagetable_walk tracepointXiao Guangrong
The P bit of page fault error code is missed in this tracepoint, fix it by passing the full error code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-18KVM: MMU: use page table level macroDavidlohr Bueso
Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value. Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-03-20x86: remove the second argument of k[un]map_atomic()Cong Wang
Acked-by: Avi Kivity <avi@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <amwang@redhat.com>
2011-12-27KVM: MMU: audit: replace mmu audit tracepoint with jump-labelXiao Guangrong
The tracepoint is only used to audit mmu code, it should not be exposed to user, let us replace it with jump-label. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27KVM: MMU: improve write flooding detectedXiao Guangrong
Detecting write-flooding does not work well, when we handle page written, if the last speculative spte is not accessed, we treat the page is write-flooding, however, we can speculative spte on many path, such as pte prefetch, page synced, that means the last speculative spte may be not point to the written page and the written page can be accessed via other sptes, so depends on the Accessed bit of the last speculative spte is not enough Instead of detected page accessed, we can detect whether the spte is accessed after it is written, if the spte is not accessed but it is written frequently, we treat is not a page table or it not used for a long time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27KVM: MMU: fast prefetch spte on invlpg pathXiao Guangrong
Fast prefetch spte for the unsync shadow page on invlpg path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27KVM: MMU: cleanup FNAME(invlpg)Xiao Guangrong
Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the same code between FNAME(invlpg) and FNAME(sync_page) Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: MMU: Fix SMEP failure during fetchYang, Wei Y
This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK bit set in the case of SMEP fault. The code updated 'eperm' after the variable was checked. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: MMU: Do not unconditionally read PDPTE from guest memoryAvi Kivity
Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded. On SVM, it is not possible to implement this, but on VMX this is possible and was indeed implemented until nested SVM changed this to unconditionally read PDPTEs dynamically. This has noticable impact when running PAE guests. Fix by changing the MMU to read PDPTRs from the cache, falling back to reading from memory for the nested MMU. Signed-off-by: Avi Kivity <avi@redhat.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-24KVM: MMU: mmio page fault supportXiao Guangrong
The idea is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: abstract some functions to handle fault pfnXiao Guangrong
Introduce handle_abnormal_pfn to handle fault pfn on page fault path, introduce mmu_invalid_pfn to handle fault pfn on prefetch path It is the preparing work for mmio page fault support Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: remove bypass_guest_pfXiao Guangrong
The idea is from Avi: | Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as | it used to be, since unsync pages always use shadow_trap_nonpresent_pte, | and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: rename 'pt_write' to 'emulate'Xiao Guangrong
If 'pt_write' is true, we need to emulate the fault. And in later patch, we need to emulate the fault even though it is not a pt_write event, so rename it to better fit the meaning Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cleanup for FNAME(fetch)Xiao Guangrong
gw->pte_access is the final access permission, since it is unified with gw->pt_access when we walked guest page table: FNAME(walk_addr_generic): pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true); Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: optimize to handle dirty bitXiao Guangrong
If dirty bit is not set, we can make the pte access read-only to avoid handing dirty bit everywhere Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cache mmio info on page fault pathXiao Guangrong
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic()Takuya Yoshikawa
Suggested by Ingo and Avi. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: MMU: Rename the walk label in walk_addr_generic()Takuya Yoshikawa
The current name does not explain the meaning well. So give it a better name "retry_walk" to show that we are trying the walk again. This was suggested by Ingo Molnar. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: MMU: Clean up the error handling of walk_addr_generic()Takuya Yoshikawa
Avoid two step jump to the error handling part. This eliminates the use of the variables present and rsvd_fault. We also use the const type qualifier to show that write/user/fetch_fault do not change in the function. Both of these were suggested by Ingo Molnar. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: Add instruction fetch checking when walking guest page tableYang, Wei Y
This patch adds instruction fetch checking when walking guest page table, to implement SMEP when emulating instead of executing natively. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-06-19KVM: MMU: Fix build warnings in walk_addr_generic()Borislav Petkov
On 3.0-rc1 I get In file included from arch/x86/kvm/mmu.c:2856: arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’: arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function In file included from arch/x86/kvm/mmu.c:2852: arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’: arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function caused by 6e2ca7d1802bf8ed9908435e34daa116662e7790. According to Takuya Yoshikawa, ptep_user won't be used uninitialized so shut up gcc. Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Link: http://lkml.kernel.org/r/20110530094604.GC21833@liondog.tnic Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22KVM: MMU: Use ptep_user for cmpxchg_gpte()Takuya Yoshikawa
The address of the gpte was already calculated and stored in ptep_user before entering cmpxchg_gpte(). This patch makes cmpxchg_gpte() to use that to make it clear that we are using the same address during walk_addr_generic(). Note that the unlikely annotations are used to show that the conditions are something unusual rather than for performance. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-22KVM: Validate userspace_addr of memslot when registeredTakuya Yoshikawa
This way, we can avoid checking the user space address many times when we read the guest memory. Although we can do the same for write if we check which slots are writable, we do not care write now: reading the guest memory happens more often than writing. [avi: change VERIFY_READ to VERIFY_WRITE] Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22KVM: MMU: Clean up gpte reading with copy_from_user()Takuya Yoshikawa
When we optimized walk_addr_generic() by not using the generic guest memory reader, we replaced copy_from_user() with get_user(): commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51 KVM: MMU: Optimize guest page table walk commit 15e2ac9a43d4d7d08088e404fddf2533a8e7d52e KVM: MMU: Fix 64-bit paging breakage on x86_32 But as Andi pointed out later, copy_from_user() does the same as get_user() as long as we give a constant size to it. So we use copy_from_user() to clean up the code. The only, noticeable, regression introduced by this is 64-bit gpte reading on x86_32 hosts needed for PAE guests. But this can be mitigated by implementing 8-byte get_user() for x86_32, if needed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22KVM: MMU: Fix 64-bit paging breakage on x86_32Takuya Yoshikawa
Fix regression introduced by commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51 KVM: MMU: Optimize guest page table walk On x86_32, get_user() does not support 64-bit values and we fail to build KVM at the point of 64-bit paging. This patch fixes this by using get_user() twice for that condition. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22KVM: MMU: Add unlikely() annotations to walk_addr_generic()Avi Kivity
walk_addr_generic() is a hot path and is also hard for the cpu to predict - some of the parameters (fetch_fault in particular) vary wildly from invocation to invocation. Add unlikely() annotations where appropriate; all walk failures are considered unlikely, as are cases where we have to mark the accessed or dirty bit, as they are slow paths both in kvm and on real processors. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22KVM: MMU: Optimize guest page table walkTakuya Yoshikawa
This patch optimizes the guest page table walk by using get_user() instead of copy_from_user(). With this patch applied, paging64_walk_addr_generic() has become about 0.5us to 1.0us faster on my Phenom II machine with NPT on. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22KVM: MMU: Make cmpxchg_gpte aware of nesting tooRoedel, Joerg
This patch makes the cmpxchg_gpte() function aware of the difference between l1-gfns and l2-gfns when nested virtualization is in use. This fixes a potential data-corruption problem in the l1-guest and makes the code work correct (at least as correct as the hardware which is emulated in this code) again. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11KVM: MMU: remove mmu_seq verification on pte update pathXiao Guangrong
The mmu_seq verification can be removed since we get the pfn in the protection of mmu_lock. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-18Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Flush TLB if PGD entry is changed in i386 PAE mode x86, dumpstack: Correct stack dump info when frame pointer is available x86: Clean up csum-copy_64.S a bit x86: Fix common misspellings x86: Fix misspelling and align params x86: Use PentiumPro-optimized partial_csum() on VIA C7
2011-03-18x86: Fix common misspellingsLucas De Marchi
They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-17KVM: MMU: cleanup pte write pathXiao Guangrong
This patch does: - call vcpu->arch.mmu.update_pte directly - use gfn_to_pfn_atomic in update_pte path The suggestion is from Avi. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17KVM: MMU: remove unused macrosXiao Guangrong
These macros are not used, so removed Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17KVM: MMU: do not record gfn in kvm_mmu_pte_writeXiao Guangrong
No need to record the gfn to verifier the pte has the same mode as current vcpu, it's because we only speculatively update the pte only if the pte and vcpu have the same mode Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-13thp: kvm mmu transparent hugepage supportAndrea Arcangeli
This should work for both hugetlbfs and transparent hugepages. [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-12KVM: MMU: handle 'map_writable' in set_spte() functionXiao Guangrong
Move the operation of 'writable' to set_spte() to clean up code [avi: remove unneeded booleanification] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: Fix incorrect direct page write protection due to ro host pageAvi Kivity
If KVM sees a read-only host page, it will map it as read-only to prevent breaking a COW. However, if the page was part of a large guest page, KVM incorrectly extends the write protection to the entire large page frame instead of limiting it to the normal host page. This results in the instantiation of a new shadow page with read-only access. If this happens for a MOVS instruction that moves memory between two normal pages, within a single large page frame, and mapped within the guest as a large page, and if, in addition, the source operand is not writeable in the host (perhaps due to KSM), then KVM will instantiate a read-only direct shadow page, instantiate an spte for the source operand, then instantiate a new read/write direct shadow page and instantiate an spte for the destination operand. Since these two sptes are in different shadow pages, MOVS will never see them at the same time and the guest will not make progress. Fix by mapping the direct shadow page read/write, and only marking the host page read-only. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: retry #PF for softmmuXiao Guangrong
Retry #PF for softmmu only when the current vcpu has the same cr3 as the time when #PF occurs Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: rename 'no_apf' to 'prefault'Xiao Guangrong
It's the speculative path if 'no_apf = 1' and we will specially handle this speculative path in the later patch, so 'prefault' is better to fit the sense. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: Pull extra page fault information into struct x86_exceptionAvi Kivity
Currently page fault cr2 and nesting infomation are carried outside the fault data structure. Instead they are placed in the vcpu struct, which results in confusion as global variables are manipulated instead of passing parameters. Fix this issue by adding address and nested fields to struct x86_exception, so this struct can carry all information associated with a fault. Signed-off-by: Avi Kivity <avi@redhat.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Tested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: Push struct x86_exception into walk_addr()Avi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: Push struct x86_exception info the various gva_to_gpa variantsAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: delay flush all tlbs on sync_page pathXiao Guangrong
Quote from Avi: | I don't think we need to flush immediately; set a "tlb dirty" bit somewhere | that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page() | can consult the bit and force a flush if set. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: abstract invalid guest pte mappingXiao Guangrong
Introduce a common function to map invalid gpte Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: remove 'clear_unsync' parameterXiao Guangrong
Remove it since we can judge it by using sp->unsync Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: rename 'reset_host_protection' to 'host_writable'Lai Jiangshan
Rename it to fit its sense better Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: fix forgot flush tlbs on sync_page pathXiao Guangrong
We should flush all tlbs after drop spte on sync_page path since Quote from Avi: | sync_page | drop_spte | kvm_mmu_notifier_invalidate_page | kvm_unmap_rmapp | spte doesn't exist -> no flush | page is freed | guest can write into freed page? KVM-Stable-Tag. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: don't mark spte notrap if reserved bit setXiao Guangrong
If reserved bit is set, we need inject the #PF with PFEC.RSVD=1, but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: propagate fault r/w information to gup(), allow read-only memoryMarcelo Tosatti
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault to writable if host pte allows it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: Retry fault before vmentryGleb Natapov
When page is swapped in it is mapped into guest memory only after guest tries to access it again and generate another fault. To save this fault we can map it immediately since we know that guest is going to access the page. Do it only when tdp is enabled for now. Shadow paging case is more complicated. CR[034] and EFER registers should be switched before doing mapping and then switched back. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>