summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-02-02powerpc/64: Move HAVE_CONTEXT_TRACKING from pseries to common KconfigAnton Blanchard
We added support for HAVE_CONTEXT_TRACKING, but placed the option inside PPC_PSERIES. This has the undesirable effect that NO_HZ_FULL can be enabled on a kernel with both powernv and pseries support, but cannot on a kernel with powernv only support. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02powerpc/sparse: Constify the address pointer in __get_user_nosleep()Daniel Axtens
In __get_user_nosleep, we create an intermediate pointer for the user address we're about to fetch. We currently don't tag this pointer as const. Make it const, as we are simply dereferencing it, and it's scope is limited to the __get_user_nosleep macro. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02powerpc/sparse: Constify the address pointer in __get_user_nocheck()Daniel Axtens
In __get_user_nocheck, we create an intermediate pointer for the user address we're about to fetch. We currently don't tag this pointer as const. Make it const, as we are simply dereferencing it, and it's scope is limited to the __get_user_nocheck macro. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02powerpc/sparse: Constify the address pointer in __get_user_check()Daniel Axtens
In __get_user_check, we create an intermediate pointer for the user address we're about to fetch. We currently don't tag this pointer as const. Make it const, as we are simply dereferencing it, and it's scope is limited to the __get_user_check macro. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02powerpc/powernv: Fix section mismatch from opal_lpc_init()Michael Ellerman
opal_lpc_init() is called from an __init routine, and calls other __init routines, so should also be __init, init? Fixes: 023b13a50183 ("powerpc/powernv: Add support for direct mapped LPC on POWER9") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Enable radix guest supportPaul Mackerras
This adds a few last pieces of the support for radix guests: * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO ioctls for radix guests * On POWER9, allow secondary threads to be on/off-lined while guests are running. * Set up LPCR and the partition table entry for radix guests. * Don't allocate the rmap array in the kvm_memory_slot structure on radix. * Don't try to initialize the HPT for radix guests, since they don't have an HPT. * Take out the code that prevents the HV KVM module from initializing on radix hosts. At this stage, we only support radix guests if the host is running in radix mode, and only support HPT guests if the host is running in HPT mode. Thus a guest cannot switch from one mode to the other, which enables some simplifications. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1Paul Mackerras
On POWER9 DD1, we need to invalidate the ERAT (effective to real address translation cache) when changing the PIDR register, which we do as part of guest entry and exit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Allow guest exit path to have MMU onPaul Mackerras
If we allow LPCR[AIL] to be set for radix guests, then interrupts from the guest to the host can be delivered by the hardware with relocation on, and thus the code path starting at kvmppc_interrupt_hv can be executed in virtual mode (MMU on) for radix guests (previously it was only ever executed in real mode). Most of the code is indifferent to whether the MMU is on or off, but the calls to OPAL that use the real-mode OPAL entry code need to be switched to use the virtual-mode code instead. The affected calls are the calls to the OPAL XICS emulation functions in kvmppc_read_one_intr() and related functions. We test the MSR[IR] bit to detect whether we are in real or virtual mode, and call the opal_rm_* or opal_* function as appropriate. The other place that depends on the MMU being off is the optimization where the guest exit code jumps to the external interrupt vector or hypervisor doorbell interrupt vector, or returns to its caller (which is __kvmppc_vcore_entry). If the MMU is on and we are returning to the caller, then we don't need to use an rfid instruction since the MMU is already on; a simple blr suffices. If there is an external or hypervisor doorbell interrupt to handle, we branch to the relocation-on version of the interrupt vector. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movementPaul Mackerras
With radix, the guest can do TLB invalidations itself using the tlbie (global) and tlbiel (local) TLB invalidation instructions. Linux guests use local TLB invalidations for translations that have only ever been accessed on one vcpu. However, that doesn't mean that the translations have only been accessed on one physical cpu (pcpu) since vcpus can move around from one pcpu to another. Thus a tlbiel might leave behind stale TLB entries on a pcpu where the vcpu previously ran, and if that task then moves back to that previous pcpu, it could see those stale TLB entries and thus access memory incorrectly. The usual symptom of this is random segfaults in userspace programs in the guest. To cope with this, we detect when a vcpu is about to start executing on a thread in a core that is a different core from the last time it executed. If that is the case, then we mark the core as needing a TLB flush and then send an interrupt to any thread in the core that is currently running a vcpu from the same guest. This will get those vcpus out of the guest, and the first one to re-enter the guest will do the TLB flush. The reason for interrupting the vcpus executing on the old core is to cope with the following scenario: CPU 0 CPU 1 CPU 4 (core 0) (core 0) (core 1) VCPU 0 runs task X VCPU 1 runs core 0 TLB gets entries from task X VCPU 0 moves to CPU 4 VCPU 0 runs task X Unmap pages of task X tlbiel (still VCPU 1) task X moves to VCPU 1 task X runs task X sees stale TLB entries That is, as soon as the VCPU starts executing on the new core, it could unmap and tlbiel some page table entries, and then the task could migrate to one of the VCPUs running on the old core and potentially see stale TLB entries. Since the TLB is shared between all the threads in a core, we only use the bit of kvm->arch.need_tlb_flush corresponding to the first thread in the core. To ensure that we don't have a window where we can miss a flush, this moves the clearing of the bit from before the actual flush to after it. This way, two threads might both do the flush, but we prevent the situation where one thread can enter the guest before the flush is finished. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix modePaul Mackerras
If the guest is in radix mode, then it doesn't have a hashed page table (HPT), so all of the hypercalls that manipulate the HPT can't work and should return an error. This adds checks to make them return H_FUNCTION ("function not supported"). Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Implement dirty page logging for radix guestsPaul Mackerras
This adds code to keep track of dirty pages when requested (that is, when memslot->dirty_bitmap is non-NULL) for radix guests. We use the dirty bits in the PTEs in the second-level (partition-scoped) page tables, together with a bitmap of pages that were dirty when their PTE was invalidated (e.g., when the page was paged out). This bitmap is stored in the first half of the memslot->dirty_bitmap area, and kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the bitmap that gets returned to userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: MMU notifier callbacks for radix guestsPaul Mackerras
This adapts our implementations of the MMU notifier callbacks (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva) to call radix functions when the guest is using radix. These implementations are much simpler than for HPT guests because we have only one PTE to deal with, so we don't need to traverse rmap chains. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Page table construction and page faults for radix guestsPaul Mackerras
This adds the code to construct the second-level ("partition-scoped" in architecturese) page tables for guests using the radix MMU. Apart from the PGD level, which is allocated when the guest is created, the rest of the tree is all constructed in response to hypervisor page faults. As well as hypervisor page faults for missing pages, we also get faults for reference/change (RC) bits needing to be set, as well as various other error conditions. For now, we only set the R or C bit in the guest page table if the same bit is set in the host PTE for the backing page. This code can take advantage of the guest being backed with either transparent or ordinary 2MB huge pages, and insert 2MB page entries into the guest page tables. There is no support for 1GB huge pages yet. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guestsPaul Mackerras
This adds code to branch around the parts that radix guests don't need - clearing and loading the SLB with the guest SLB contents, saving the guest SLB contents on exit, and restoring the host SLB contents. Since the host is now using radix, we need to save and restore the host value for the PID register. On hypervisor data/instruction storage interrupts, we don't do the guest HPT lookup on radix, but just save the guest physical address for the fault (from the ASDR register) in the vcpu struct. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Add basic infrastructure for radix guestsPaul Mackerras
This adds a field in struct kvm_arch and an inline helper to indicate whether a guest is a radix guest or not, plus a new file to contain the radix MMU code, which currently contains just a translate function which knows how to traverse the guest page tables to translate an address. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9Paul Mackerras
POWER9 adds a register called ASDR (Access Segment Descriptor Register), which is set by hypervisor data/instruction storage interrupts to contain the segment descriptor for the address being accessed, assuming the guest is using HPT translation. (For radix guests, it contains the guest real address of the access.) Thus, for HPT guests on POWER9, we can use this register rather than looking up the SLB with the slbfee. instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9Paul Mackerras
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl for HPT guests on POWER9. With this, we can return 1 for the KVM_CAP_PPC_MMU_HASH_V3 capability. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMUPaul Mackerras
This adds two capabilities and two ioctls to allow userspace to find out about and configure the POWER9 MMU in a guest. The two capabilities tell userspace whether KVM can support a guest using the radix MMU, or using the hashed page table (HPT) MMU with a process table and segment tables. (Note that the MMUs in the POWER9 processor cores do not use the process and segment tables when in HPT mode, but the nest MMU does). The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify whether a guest will use the radix MMU or the HPT MMU, and to specify the size and location (in guest space) of the process table. The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about the radix MMU. It returns a list of supported radix tree geometries (base page size and number of bits indexed at each level of the radix tree) and the encoding used to specify the various page sizes for the TLB invalidate entry instruction. Initially, both capabilities return 0 and the ioctls return -EINVAL, until the necessary infrastructure for them to operate correctly is added. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Allow for relocation-on interrupts from guest to hostPaul Mackerras
With host and guest both using radix translation, it is feasible for the host to take interrupts that come from the guest with relocation on, and that is in fact what the POWER9 hardware will do when LPCR[AIL] = 3. All such interrupts use HSRR0/1 not SRR0/1 except for system call with LEV=1 (hcall). Therefore this adds the KVM tests to the _HV variants of the relocation-on interrupt handlers, and adds the KVM test to the relocation-on system call entry point. We also instantiate the relocation-on versions of the hypervisor data storage and instruction interrupt handlers, since these can occur with relocation on in radix guests. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Make type of partition table flush depend on partition typePaul Mackerras
When changing a partition table entry on POWER9, we do a particular form of the tlbie instruction which flushes all TLBs and caches of the partition table for a given logical partition ID (LPID). This instruction has a field in the instruction word, labelled R (radix), which should be 1 if the partition was previously a radix partition and 0 if it was a HPT partition. This implements that logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Export pgtable_cache and pgtable_cache_add for KVMPaul Mackerras
This exports the pgtable_cache array and the pgtable_cache_add function so that HV KVM can use them for allocating radix page tables for guests. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: More definitions for POWER9Paul Mackerras
This adds definitions for bits in the DSISR register which are used by POWER9 for various translation-related exception conditions, and for some more bits in the partition table entry that will be needed by KVM. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Enable use of radix MMU under hypervisor on POWER9Paul Mackerras
To use radix as a guest, we first need to tell the hypervisor via the ibm,client-architecture call first that we support POWER9 and architecture v3.00, and that we can do either radix or hash and that we would like to choose later using an hcall (the H_REGISTER_PROC_TBL hcall). Then we need to check whether the hypervisor agreed to us using radix. We need to do this very early on in the kernel boot process before any of the MMU initialization is done. If the hypervisor doesn't agree, we can't use radix and therefore clear the radix MMU feature bit. Later, when we have set up our process table, which points to the radix tree for each process, we need to install that using the H_REGISTER_PROC_TBL hcall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/pseries: Fixes for the "ibm,architecture-vec-5" optionsPaul Mackerras
This fixes the byte index values for some of the option bits in the "ibm,architectur-vec-5" property. The "platform facilities options" bits are in byte 17 not byte 14, so the upper 8 bits of their definitions need to be 0x11 not 0x0E. The "sub processor support" option is in byte 21 not byte 15. Note none of these options are actually looked up in "ibm,architecture-vec-5" at this time, so there is no bug. When checking whether option bits are set, we should check that the offset of the byte being checked is less than the vector length that we got from the hypervisor. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Don't try to use radix MMU under a hypervisorPaul Mackerras
Currently, if the kernel is running on a POWER9 processor under a hypervisor, it will try to use the radix MMU even though it doesn't have the necessary code to use radix under a hypervisor (it doesn't negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The result is that the guest kernel will crash when it tries to turn on the MMU. This fixes it by looking for the /chosen/ibm,architecture-vec-5 property, and if it exists, clears the radix MMU feature bit, before we decide whether to initialize for radix or HPT. This property is created by the hypervisor as a result of the guest calling the ibm,client-architecture-support method to indicate its capabilities, so it will indicate whether the hypervisor agreed to us using radix. Systems without a hypervisor may have this property also (for example, skiboot creates it), so we check the HV bit in the MSR to see whether we are running as a guest or not. If we are in hypervisor mode, then we can do whatever we like including using the radix MMU. The reason for using this property is that in future, when we have support for using radix under a hypervisor, we will need to check this property to see whether the hypervisor agreed to us using radix. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interruptsNicholas Piggin
64-bit Book3S exception handlers must find the dynamic kernel base to add to the target address when branching beyond __end_interrupts, in order to support kernel running at non-0 physical address. Support this in KVM by branching with CTR, similarly to regular interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and restored after the branch. Without this, the host kernel hangs and crashes randomly when it is running at a non-0 address and a KVM guest is started. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: unstub radix__vmemmap_remove_mapping()Reza Arbab
Use remove_pagetable() and friends for radix vmemmap removal. We do not require the special-case handling of vmemmap done in the x86 versions of these functions. This is because vmemmap_free() has already freed the mapped pages, and calls us with an aligned address range. So, add a few failsafe WARNs, but otherwise the code to remove physical mappings is already sufficient for vmemmap. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: add radix__remove_section_mapping()Reza Arbab
Tear down and free the four-level page tables of physical mappings during memory hotremove. Borrow the basic structure of remove_pagetable() and friends from the identically-named x86 functions. Reduce the frequency of tlb flushes and page_table_lock spinlocks by only doing them in the outermost function. There was some question as to whether the locking is needed at all. Leave it for now, but we could consider dropping it. Memory must be offline to be removed, thus not in use. So there shouldn't be the sort of concurrent page walking activity here that might prompt us to use RCU. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: add radix__create_section_mapping()Reza Arbab
Wire up memory hotplug page mapping for radix. Share the mapping function already used by radix_init_pgtable(). Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: refactor radix physical page mappingReza Arbab
Move the page mapping code in radix_init_pgtable() into a separate function that will also be used for memory hotplug. The current goto loop progressively decreases its mapping size as it covers the tail of a range whose end is unaligned. Change this to a for loop which can do the same for both ends of the range. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/powernv: Add support for direct mapped LPC on POWER9Benjamin Herrenschmidt
Use the new non-PCI ISA bridge support to expose the POWER9 LPC bus as direct mapped via the ISA IO port range. This enables direct access via drivers such as 8250 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc: Add support for non-PCI ISA bridgesBenjamin Herrenschmidt
The POWER9 chip supports an LPC bus that isn't hanging off a PCI bus, so let's add support for that, mapping it to the reserved space at ISA_IO_BASE Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc: Move isa bridge definitions to separate includeBenjamin Herrenschmidt
We'll be adding non-PCI isa bridge support so let's not have all the definition in pci-bridge.h Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31Documentation:powerpc: Add device-tree bindings for power-mgtGautham R. Shenoy
Document the device-tree bindings defining the the properties under the @power-mgt node in the device tree that describe the idle states for Linux running on baremetal POWER servers. These bindings are documented separately instead of using the the common idle state bindings since the idle-states on POWER servers are exposed as property arrays where as the common idle state bindings expect idle-states to be described as nodes. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powernv: Pass PSSCR value and mask to power9_idle_stopGautham R. Shenoy
The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31cpuidle:powernv: Add helper function to populate powernv idle states.Gautham R. Shenoy
In the current code for powernv_add_idle_states, there is a lot of code duplication while initializing an idle state in powernv_states table. Add an inline helper function to populate the powernv_states[] table for a given idle state. Invoke this for populating the "Nap", "Fastsleep" and the stop states in powernv_add_idle_states. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powernv:stop: Rename pnv_arch300_idle_init to pnv_power9_idle_initGautham R. Shenoy
Balbir pointed out that the name of the function pnv_arch300_idle_init was inconsistent with the names of the variables and functions pertaining to POWER9 features in book3s_idle.S. This patch renames pnv_arch300_idle_init to pnv_power9_idle_init. This patch does not change any behaviour. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macroGautham R. Shenoy
Currently all the low-power idle states are expected to wake up at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ that puts the CPU to an idle state and never returns. On ISA v3.0, when the ESL and EC bits in the PSSCR are zero, the CPU is expected to wake up at the next instruction of the idle instruction. This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the no-return variant and reuses the name IDLE_STATE_ENTER_SEQ for a variant that allows resuming operation at the instruction next to the idle-instruction. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/powernv: Use OPAL call for TCE kill on NVLink2Alistair Popple
Add detection of NPU2 PHBs. NPU2/NVLink2 has a different register layout for the TCE kill register therefore TCE invalidation should be done via the OPAL call rather than using the register directly as it is for PHB3 and NVLink1. This changes TCE invalidation to use the OPAL call in the case of a NPU2 PHB model. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/powernv: Initialise nest mmuAlistair Popple
POWER9 contains an off core mmu called the nest mmu (NMMU). This is used by other hardware units on the chip to translate virtual addresses into real addresses. The unit attempting an address translation provides the majority of the context required for the translation request except for the base address of the partition table (ie. the PTCR) which needs to be programmed into the NMMU. This patch adds a call to OPAL to set the PTCR for the nest mmu in opal_init(). Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/mm: Allow memory hotplug into an offline nodeReza Arbab
Relax the check preventing us from hotplugging into an offline node. This limitation was added in commit 482ec7c403d2 ("[PATCH] powerpc numa: Support sparse online node map") to prevent adding resources to an uninitialized node. These days, there is no harm in doing so. The addition will actually cause the node to be initialized and onlined; add_memory_resource() calls hotadd_new_pgdat() (if necessary) and node_set_online(). Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/mm: Simplify loop control in parse_numa_properties()Reza Arbab
The flow of the main loop in parse_numa_properties() is overly complicated. Simplify it to be less confusing and easier to read. No functional change. The end of the main loop in parse_numa_properties() looks like this: for_each_node_by_type(...) { ... if (!condition) { if (--ranges) goto new_range; else continue; } statement(); if (--ranges) goto new_range; /* else * continue; <- implicit, this is the end of the loop */ } The only effect of !condition is to skip execution of statement(). This can be rewritten in a simpler way: for_each_node_by_type(...) { ... if (condition) statement(); if (--ranges) goto new_range; } Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/fadump: Fix the race in crash_fadump().Mahesh Salgaonkar
There are chances that multiple CPUs can call crash_fadump() simultaneously and would start duplicating same info to vmcoreinfo ELF note section. This causes makedumpfile to fail during kdump capture. One example is, triggering dumprestart from HMC which sends system reset to all the CPUs at once. makedumpfile --dump-dmesg /proc/vmcore read_vmcoreinfo_basic_info: Invalid data in /tmp/vmcoreinfoyjgxlL: CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971 makedumpfile Failed. Running makedumpfile --dump-dmesg /proc/vmcore failed (1). makedumpfile -d 31 -l /proc/vmcore read_vmcoreinfo_basic_info: Invalid data in /tmp/vmcoreinfo1mmVdO: CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971 makedumpfile Failed. Running makedumpfile -d 31 -l /proc/vmcore failed (1). Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/mm/hash: Properly mask the ESID bits when building proto VSIDAneesh Kumar K.V
The proto VSID is built using both the MMU context id and effective segment ID (ESID). We should not have overlapping bits between those. That could result in us having a VSID collision. With the current code we missed masking the top bits of the ESID. This implies for kernel address we ended up using the top 4 bits of the ESID as part of the proto VSID, which is wrong. The current code use the top 4 context values (0x7fffc - 0x7ffff) for the kernel. With those context IDs used for the kernel, we don't run into VSID collisions because we get the same proto VSID irrespective of whether we mask the ESID bits or not. eg: ea = 0xf000000000000000 context = 0x7ffff w/out masking: proto_vsid = (0x7ffff << 6 | 0xf000000000000000 >> 40) = (0x1ffffc0 | 0xf00000) = 0x1ffffc0 with masking: proto_vsid = (0x7ffff << 6 | ((0xf000000000000000 >> 40) & 0x3f)) = (0x1ffffc0 | (0xf00000 & 0x3f)) = 0x1ffffc0 | 0) = 0x1ffffc0 So although there is no bug, the code is still overly subtle, so fix it to save ourselves pain in future. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-27KVM: PPC: Book3S: Move 64-bit KVM interrupt handler out from alt sectionNicholas Piggin
A subsequent patch to make KVM handlers relocation-safe makes them unusable from within alt section "else" cases (due to the way fixed addresses are taken from within fixed section head code). Stop open-coding the KVM handlers, and add them both as normal. A more optimal fix may be to allow some level of alternate feature patching in the exception macros themselves, but for now this will do. The TRAMP_KVM handlers must be moved to the "virt" fixed section area (name is arbitrary) in order to be closer to .text and avoid the dreaded "relocation truncated to fit" error. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-27KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HVNicholas Piggin
Change the calling convention to put the trap number together with CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV handler. The 64-bit PR handler entry translates the calling convention back to match the previous call convention (i.e., shared with 32-bit), for simplicity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-26powerpc/8xx: Perf events on PPC 8xxChristophe Leroy
This patch has been reworked since RFC version. In the RFC, this patch was preceded by a patch clearing MSR RI for all PPC32 at all time at exception prologs. Now MSR RI clearing is done only when this 8xx perf events functionality is compiled in, it is therefore limited to 8xx and merged inside this patch. Other main changes have been to take into account detailed review from Peter Zijlstra. The instructions counter has been reworked to behave as a free running counter like the three other counters. The 8xx has no PMU, however some events can be emulated by other means. This patch implements the following events (as reported by 'perf list'): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] dTLB-load-misses [Hardware cache event] iTLB-load-misses [Hardware cache event] 'cycles' event is implemented using the timebase clock. Timebase clock corresponds to CPU clock divided by 16, so number of cycles is approximatly 16 times the number of TB ticks On the 8xx, TLB misses are handled by software. It is therefore easy to count all TLB misses each time the TLB miss exception is called. 'instructions' is calculated by using instruction watchpoint counter. This patch sets counter A to count instructions at address greater than 0, hence we count all instructions executed while MSR RI bit is set. The counter is set to the maximum which is 0xffff. Every 65535 instructions, debug instruction breakpoint exception fires. The exception handler increments a counter in memory which then represent the upper part of the instruction counter. We therefore end up with a 48 bits counter. In order to avoid unnecessary overhead while no perf event is active, this counter is started when the first event referring to this counter is added, and the counter is stopped when the last event referring to it is deleted. In order to properly support breakpoint exceptions, MSR RI bit has to be unset in exception epilogs in order to avoid breakpoint exceptions during critical sections during changes to SRR0 and SRR1 would be problematic. All counters are handled as free running counters. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2017-01-26powerpc/32: Remove FIX_SRR1Christophe Leroy
FIX_SRR1() is defined as blank. Last useful instance of FIX_SRR1() was removed by commit 40ef8cbc6d360 ("powerpc: Get 64-bit configs to compile with ARCH=powerpc") in 2005. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2017-01-25powerpc/8xx: Implement hw_breakpointChristophe Leroy
This patch implements HW breakpoint on the 8xx. The 8xx has capability to manage HW breakpoints, which is slightly different than BOOK3S: 1/ The breakpoint match doesn't trigger a DSI exception but a dedicated data breakpoint exception. 2/ The breakpoint happens after the instruction has completed, no need to single step or emulate the instruction, 3/ Matched address is not set in DAR but in BAR, 4/ DABR register doesn't exist, instead we have registers LCTRL1, LCTRL2 and CMPx registers, 5/ The match on one comparator is not on a double word but on a single word. The patch does: 1/ Prepare the dedicated registers in call to __set_dabr(). In order to emulate the double word handling of BOOK3S, comparator E is set to DABR address value and comparator F to address + 4. Then breakpoint 1 is set to match comparator E or F, 2/ Skip the singlestepping stage when compiled for CONFIG_PPC_8xx, 3/ Implement the exception. In that exception, the matched address is taken from SPRN_BAR and manage as if it was from SPRN_DAR. 4/ I/D TLB error exception routines perform a tlbie on bad TLBs. That tlbie triggers the breakpoint exception when performed on the breakpoint address. For this reason, the routine returns if the match is from one of those two tlbie. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2017-01-25powerpc/32: Enable HW_BREAKPOINT on BOOK3SChristophe Leroy
BOOK3S also has DABR register and capability to handle data breakpoints, so this patch enable it on all BOOK3S, not only 64 bits. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>