summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm
AgeCommit message (Collapse)Author
2013-08-30Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into queueGleb Natapov
* 'kvm-ppc-next' of git://github.com/agraf/linux-2.6: KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation powerpc/kvm: Copy the pvr value after memset KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry kvm/ppc/booke: Don't call kvm_guest_enter twice kvm/ppc: Call trace_hardirqs_on before entry KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers KVM: PPC: Book3S HV: Correct tlbie usage powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation. powerpc/kvm: Contiguous memory allocator based RMA allocation powerpc/kvm: Contiguous memory allocator based hash page table allocation KVM: PPC: Book3S: Ignore DABR register mm/cma: Move dma contiguous changes into a seperate config
2013-08-29Merge remote-tracking branch 'origin/next' into kvm-ppc-nextAlexander Graf
Conflicts: mm/Kconfig CMA DMA split and ZSWAP introduction were conflicting, fix up manually.
2013-08-29KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()Paul Mackerras
This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large page bit in the hashed page table entries (HPTEs) it looks at, and to simplify and streamline the code. The checking of the first dword of each HPTE is now done with a single mask and compare operation, and all the code dealing with the matching HPTE, if we find one, is consolidated in one place in the main line of the function flow. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S PR: Make instruction fetch fallback work for system callsPaul Mackerras
It turns out that if we exit the guest due to a hcall instruction (sc 1), and the loading of the instruction in the guest exit path fails for any reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the instruction after the hcall instruction rather than the hcall itself. This in turn means that the instruction doesn't get recognized as an hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel as a sc instruction. That usually results in the guest kernel getting a return code of 38 (ENOSYS) from an hcall, which often triggers a BUG_ON() or other failure. This fixes the problem by adding a new variant of kvmppc_get_last_inst() called kvmppc_get_last_sc(), which fetches the instruction if necessary from pc - 4 rather than pc. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMXPaul Mackerras
Currently the code assumes that once we load up guest FP/VSX or VMX state into the CPU, it stays valid in the CPU registers until we explicitly flush it to the thread_struct. However, on POWER7, copy_page() and memcpy() can use VMX. These functions do flush the VMX state to the thread_struct before using VMX instructions, but if this happens while we have guest state in the VMX registers, and we then re-enter the guest, we don't reload the VMX state from the thread_struct, leading to guest corruption. This has been observed to cause guest processes to segfault. To fix this, we check before re-entering the guest that all of the bits corresponding to facilities owned by the guest, as expressed in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr. Any bits that have been cleared correspond to facilities that have been used by kernel code and thus flushed to the thread_struct, so for them we reload the state from the thread_struct. We also need to check current->thread.regs->msr before calling giveup_fpu() or giveup_altivec(), since if the relevant bit is clear, the state has already been flushed to the thread_struct and to flush it again would corrupt it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S: Fix compile error in XICS emulationPaul Mackerras
Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation") added a call to get_tb() but didn't include the header that defines it, and on some configs this means book3s_xics.c fails to compile: arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’: arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration] Cc: stable@vger.kernel.org [v3.10, v3.11] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S PR: return appropriate error when allocation failsThadeu Lima de Souza Cascardo
err was overwritten by a previous function call, and checked to be 0. If the following page allocation fails, 0 is going to be returned instead of -ENOMEM. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28arch: powerpc: kvm: add signed type cast for comparationChen Gang
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when failure occurs, so it need a type cast for comparing. 'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number when failure occurs, so it need a type cast for comparing. Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-26ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flagYann Droneaud
KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-22powerpc/kvm: Copy the pvr value after memsetAneesh Kumar K.V
Otherwise we would clear the pvr value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-25KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entryPaul Mackerras
Unlike the other general-purpose SPRs, SPRG3 can be read by usermode code, and is used in recent kernels to store the CPU and NUMA node numbers so that they can be read by VDSO functions. Thus we need to load the guest's SPRG3 value into the real SPRG3 register when entering the guest, and restore the host's value when exiting the guest. We don't need to save the guest SPRG3 value when exiting the guest as usermode code can't modify SPRG3. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-18KVM: Introduce kvm_arch_memslots_updated()Takuya Yoshikawa
This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: Alexander Graf <agraf@suse.de> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-11kvm/ppc/booke: Don't call kvm_guest_enter twiceScott Wood
kvm_guest_enter() was already called by kvmppc_prepare_to_enter(). Don't call it again. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11kvm/ppc: Call trace_hardirqs_on before entryScott Wood
Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-10KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlersPaul Mackerras
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S can contain negative values, if some of the handlers end up before the table in the vmlinux binary. Thus we need to use a sign-extending load to read the values in the table rather than a zero-extending load. Without this, the host crashes when the guest does one of the hcalls with negative offsets, due to jumping to a bogus address. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-10KVM: PPC: Book3S HV: Correct tlbie usagePaul Mackerras
This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to use the correct form on PPC970. On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower) field of the RB value incorrectly for 64k pages. This fixes it. Since we now have several cases to handle for the tlbie instruction, this factors out the code to do a sequence of tlbies into a new function, do_tlbies(), and calls that from the various places where the code was doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove() use the same global_invalidates() function for determining whether to do local or global TLB invalidations as is used in other places, for consistency, and also to make sure that kvm->arch.need_tlb_flush gets updated properly. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.Aneesh Kumar K.V
Both RMA and hash page table request will be a multiple of 256K. We can use a chunk size of 256K to track the free/used 256K chunk in the bitmap. This should help to reduce the bitmap size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08powerpc/kvm: Contiguous memory allocator based RMA allocationAneesh Kumar K.V
Older version of power architecture use Real Mode Offset register and Real Mode Limit Selector for mapping guest Real Mode Area. The guest RMA should be physically contigous since we use the range when address translation is not enabled. This patch switch RMA allocation code to use contigous memory allocator. The patch also remove the the linear allocator which not used any more Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08powerpc/kvm: Contiguous memory allocator based hash page table allocationAneesh Kumar K.V
Powerpc architecture uses a hash based page table mechanism for mapping virtual addresses to physical address. The architecture require this hash page table to be physically contiguous. With KVM on Powerpc currently we use early reservation mechanism for allocating guest hash page table. This implies that we need to reserve a big memory region to ensure we can create large number of guest simultaneously with KVM on Power. Another disadvantage is that the reserved memory is not available to rest of the subsystems and and that implies we limit the total available memory in the host. This patch series switch the guest hash page table allocation to use contiguous memory allocator. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-08KVM: PPC: Book3S: Ignore DABR registerAlexander Graf
We don't emulate breakpoints yet, so just ignore reads and writes to / from DABR. This fixes booting of more recent Linux guest kernels for me. Reported-by: Nello Martuscielli <ppc.addon@gmail.com> Tested-by: Nello Martuscielli <ppc.addon@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-04Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
2013-07-02Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The major changes: - Simplify RCU's grace-period and callback processing based on the new numbering for callbacks. - Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for single-CPU low-latency systems. - SRCU-related changes and fixes. - Miscellaneous fixes, including converting a few remaining printk() calls to pr_*(). - Documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs rcu: Shrink TINY_RCU by moving exit_rcu() rcu: Remove TINY_PREEMPT_RCU tracing documentation rcu: Consolidate rcutiny_plugin.h ifdefs rcu: Remove rcu_preempt_note_context_switch() rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h rcu: Remove check_cpu_stall_preempt() rcu: Simplify RCU_TINY RCU callback invocation rcu: Remove rcu_preempt_process_callbacks() rcu: Remove rcu_preempt_remove_callbacks() rcu: Remove rcu_preempt_check_callbacks() rcu: Remove show_tiny_preempt_stats() rcu: Remove TINY_PREEMPT_RCU powerpc,kvm: fix imbalance srcu_read_[un]lock() rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw(). rcu: Apply Dave Jones's NOCB Kconfig help feedback rcu: Merge adjacent identical ifdefs rcu: Drive quiescent-state-forcing delay from HZ rcu: Remove "Experimental" flags kthread: Add kworker kthreads to OS-jitter documentation ...
2013-07-01Merge tag 'v3.10' into nextBenjamin Herrenschmidt
Merge 3.10 in order to get some of the last minute powerpc changes, resolve conflicts and add additional fixes on top of them.
2013-06-30KVM: PPC: Ignore PIR writesAlexander Graf
While technically it's legal to write to PIR and have the identifier changed, we don't implement logic to do so because we simply expose vcpu_id to the guest. So instead, let's ignore writes to PIR. This ensures that we don't inject faults into the guest for something the guest is allowed to do. While at it, we cross our fingers hoping that it also doesn't mind that we broke its PIR read values. Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30KVM: PPC: Book3S PR: Invalidate SLB entries properlyPaul Mackerras
At present, if the guest creates a valid SLB (segment lookaside buffer) entry with the slbmte instruction, then invalidates it with the slbie instruction, then reads the entry with the slbmfee/slbmfev instructions, the result of the slbmfee will have the valid bit set, even though the entry is not actually considered valid by the host. This is confusing, if not worse. This fixes it by zeroing out the orige and origv fields of the SLB entry structure when the entry is invalidated. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30KVM: PPC: Book3S PR: Allow guest to use 1TB segmentsPaul Mackerras
With this, the guest can use 1TB segments as well as 256MB segments. Since we now have the situation where a single emulated guest segment could correspond to multiple shadow segments (as the shadow segments are still 256MB segments), this adds a new kvmppc_mmu_flush_segment() to scan for all shadow segments that need to be removed. This restructures the guest HPT (hashed page table) lookup code to use the correct hashing and matching functions for HPTEs within a 1TB segment. We use the standard hpt_hash() function instead of open-coding the hash calculation, and we use HPTE_V_COMPARE() with an AVPN value that has the B (segment size) field included. The calculation of avpn is done a little earlier since it doesn't change in the loop starting at the do_second label. The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that it returns a 256MB VSID even if the guest SLB entry is a 1TB entry. This is because the users of this function are creating 256MB SLB entries. We set a new VSID_1T flag so that entries created from 1T segments don't collide with entries from 256MB segments. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a matchPaul Mackerras
The loop in kvmppc_mmu_book3s_64_xlate() that looks up a translation in the guest hashed page table (HPT) keeps going if it finds an HPTE that matches but doesn't allow access. This is incorrect; it is different from what the hardware does, and there should never be more than one matching HPTE anyway. This fixes it to stop when any matching HPTE is found. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entryPaul Mackerras
On entering a PR KVM guest, we invalidate the whole SLB before loading up the guest entries. We do this using an slbia instruction, which invalidates all entries except entry 0, followed by an slbie to invalidate entry 0. However, the slbie turns out to be ineffective in some circumstances (specifically when the host linear mapping uses 64k pages) because of errors in computing the parameter to the slbie. The result is that the guest kernel hangs very early in boot because it takes a DSI the first time it tries to access kernel data using a linear mapping address in real mode. Currently we construct bits 36 - 43 (big-endian numbering) of the slbie parameter by taking bits 56 - 63 of the SLB VSID doubleword. These bits for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5 reserved bits. For the SLB VSID doubleword these are C (class, 1 bit), reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits. Thus we are not setting the B field correctly, and when LP = 01 as it is for 64k pages, we are setting a reserved bit. Rather than add more instructions to calculate the slbie parameter correctly, this takes a simpler approach, which is to set entry 0 to zeroes explicitly. Normally slbmte should not be used to invalidate an entry, since it doesn't invalidate the ERATs, but it is OK to use it to invalidate an entry if it is immediately followed by slbia, which does invalidate the ERATs. (This has been confirmed with the Power architects.) This approach takes fewer instructions and will work whatever the contents of entry 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30KVM: PPC: Book3S PR: Fix proto-VSID calculationsPaul Mackerras
This makes sure the calculation of the proto-VSIDs used by PR KVM is done with 64-bit arithmetic. Since vcpu3s->context_id[] is int, when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done with 32-bit instructions, possibly leading to significant bits getting lost, as the context id can be up to 524283 and ESID_BITS is 18. To fix this we cast the context id to u64 before shifting. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELLTiejun Chen
Availablity of the doorbell_exception function is guarded by CONFIG_PPC_DOORBELL. Use the same define to guard our caller of it. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> [agraf: improve patch description] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-21powerpc/kvm: Handle transparent hugepage in KVMAneesh Kumar K.V
We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc: Replace find_linux_pte with find_linux_pte_or_hugepteAneesh Kumar K.V
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc/mm: handle hugepage size correctly when invalidating hpte entriesAneesh Kumar K.V
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-19Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: "The major changes for this series are: 1. Simplify RCU's grace-period and callback processing based on the new numbering for callbacks. These were posted to LKML at https://lkml.org/lkml/2013/5/20/330. 2. Documentation updates. These were posted to LKML at https://lkml.org/lkml/2013/5/20/348. 3. Miscellaneous fixes, including converting a few remaining printk() calls to pr_*(). These were posted to LKML at https://lkml.org/lkml/2013/5/20/324. 4. SRCU-related changes and fixes. These were posted to LKML at https://lkml.org/lkml/2013/5/20/425. 5. Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for single-CPU low-latency systems. These were posted to LKML at https://lkml.org/lkml/2013/5/20/427." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19kvm/ppc/booke: Delay kvmppc_lazy_ee_enableScott Wood
kwmppc_lazy_ee_enable() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_lazy_ee_enable(). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-11kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()Scott Wood
EE is hard-disabled on entry to kvmppc_handle_exit(), so call hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled is unset. Without this, we get warnings such as arch/powerpc/kernel/time.c:300, and sometimes host kernel hangs. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11kvm/ppc/booke: Hold srcu lock when calling gfn functionsScott Wood
KVM core expects arch code to acquire the srcu lock when calling gfn_to_memslot and similar functions. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11kvm/ppc/booke64: Disable e6500 supportScott Wood
The previous patch made 64-bit booke KVM build again, but Altivec support is still not complete, and we can't prevent the guest from turning on Altivec (which can corrupt host state until state save/restore is implemented). Disable e6500 on KVM until this is fixed. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-10powerpc,kvm: fix imbalance srcu_read_[un]lock()Lai Jiangshan
At the point of up_out label in kvmppc_hv_setup_htab_rma(), srcu read lock is still held. We have to release it before return. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm@vger.kernel.org Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-01powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulationPaul Mackerras
This adds the remaining two hypercalls defined by PAPR for manipulating the XICS interrupt controller, H_IPOLL and H_XIRR_X. H_IPOLL returns information about the priority and pending interrupts for a virtual cpu, without changing any state. H_XIRR_X is like H_XIRR in that it reads and acknowledges the highest-priority pending interrupt, but it also returns the timestamp (timebase register value) from when the interrupt was first received by the hypervisor. Currently we just return the current time, since we don't do any software queueing of virtual interrupts inside the XICS emulation code. These hcalls are not currently used by Linux guests, but may be in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-19KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in MakefilesMarc Zyngier
As requested by the KVM maintainers, remove the addprefix used to refer to the main KVM code from the arch code, and replace it with a KVM variable that does the same thing. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-05Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
2013-05-02Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc update from Benjamin Herrenschmidt: "The main highlights this time around are: - A pile of addition POWER8 bits and nits, such as updated performance counter support (Michael Ellerman), new branch history buffer support (Anshuman Khandual), base support for the new PCI host bridge when not using the hypervisor (Gavin Shan) and other random related bits and fixes from various contributors. - Some rework of our page table format by Aneesh Kumar which fixes a thing or two and paves the way for THP support. THP itself will not make it this time around however. - More Freescale updates, including Altivec support on the new e6500 cores, new PCI controller support, and a pile of new boards support and updates. - The usual batch of trivial cleanups & fixes" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) powerpc: Fix build error for book3e powerpc: Context switch the new EBB SPRs powerpc: Turn on the EBB H/FSCR bits powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S powerpc: Setup BHRB instructions facility in HFSCR for POWER8 powerpc: Fix interrupt range check on debug exception powerpc: Update tlbie/tlbiel as per ISA doc powerpc: Print page size info during boot powerpc: print both base and actual page size on hash failure powerpc: Fix hpte_decode to use the correct decoding for page sizes powerpc: Decode the pte-lp-encoding bits correctly. powerpc: Use encode avpn where we need only avpn values powerpc: Reduce PTE table memory wastage powerpc: Move the pte free routines from common header powerpc: Reduce the PTE_INDEX_SIZE powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format powerpc: New hugepage directory format powerpc: Don't truncate pgd_index wrongly powerpc: Don't hard code the size of pte page powerpc: Save DAR and DSISR in pt_regs on MCE ...
2013-05-02KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras
This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02kvm/ppc/mpic: fix missing unlock in set_base_addr()Wei Yongjun
Add the missing unlock before return from function set_base_addr() when disables the mapping. Introduced by commit 5df554ad5b7522ea62b0ff9d5be35183494efc21 (kvm/ppc/mpic: in-kernel MPIC emulation) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/writeScott Wood
These functions do an srcu_dereference without acquiring the srcu lock themselves. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02kvm/ppc/mpic: remove usersScott Wood
This is an unused (no pun intended) leftover from when this code did reference counting. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02kvm/ppc/mpic: fix mmio region lists when multiple guests usedScott Wood
Keeping a linked list of statically defined objects doesn't work very well when we have multiple guests. :-P Switch to an array of constant objects. This fixes a hang when multiple guests are used. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: remove struct list_head from mem_reg] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...