summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2013-04-26kvm/ppc/mpic: Eliminate mmio_mappedScott Wood
We no longer need to keep track of this now that MPIC destruction always happens either during VM destruction (after MMIO has been destroyed) or during a failed creation (before the fd has been exposed to userspace, and thus before the MMIO region could have been registered). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm: destroy emulated devices on VM exitScott Wood
The hassle of getting refcounting right was greater than the hassle of keeping a list of devices to destroy on VM exit. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: MPIC: Restrict to e500 platformsAlexander Graf
The code as is doesn't make any sense on non-e500 platforms. Restrict it there, so that people don't get wrong ideas on what would actually work. This patch should get reverted as soon as it's possible to either run e500 guests on non-e500 hosts or the MPIC emulation gains support for non-e500 modes. Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: MPIC: Add support for KVM_IRQ_LINEAlexander Graf
Now that all pieces are in place for reusing generic irq infrastructure, we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply reuse it for PPC, as it will work there just as well. Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Support irq routing and irqfd for in-kernel MPICAlexander Graf
Now that all the irq routing and irqfd pieces are generic, we can expose real irqchip support to all of KVM's internal helpers. This allows us to use irqfd with the in-kernel MPIC. Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm/ppc/mpic: add KVM_CAP_IRQ_MPICScott Wood
Enabling this capability connects the vcpu to the designated in-kernel MPIC. Using explicit connections between vcpus and irqchips allows for flexibility, but the main benefit at the moment is that it simplifies the code -- KVM doesn't need vm-global state to remember which MPIC object is associated with this vm, and it doesn't need to care about ordering between irqchip creation and vcpu creation. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm/ppc/mpic: in-kernel MPIC emulationScott Wood
Hook the MPIC code up to the KVM interfaces, add locking, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm/ppc/mpic: adapt to kernel style and environmentScott Wood
Remove braces that Linux style doesn't permit, remove space after '*' that Lindent added, keep error/debug strings contiguous, etc. Substitute type names, debug prints, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm/ppc/mpic: remove some obviously unneeded codeScott Wood
Remove some parts of the code that are obviously QEMU or Raven specific before fixing style issues, to reduce the style issues that need to be fixed. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm/ppc/mpic: import hw/openpic.c from QEMUScott Wood
This is QEMU's hw/openpic.c from commit abd8d4a4d6dfea7ddea72f095f993e1de941614e ("Update version for 1.4.0-rc0"), run through Lindent with no other changes to ease merging future changes between Linux and QEMU. Remaining style issues (including those introduced by Lindent) will be fixed in a later patch. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty mapPaul Mackerras
At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications done by the host to the virtual processor areas (VPAs) and dispatch trace logs (DTLs) registered by the guest. This is because those modifications are done either in real mode or in the host kernel context, and in neither case does the access go through the guest's HPT, and thus no change (C) bit gets set in the guest's HPT. However, the changes done by the host do need to be tracked so that the modified pages get transferred when doing live migration. In order to track these modifications, this adds a dirty flag to the struct representing the VPA/DTL areas, and arranges to set the flag when the VPA/DTL gets modified by the host. Then, when we are collecting the dirty log, we also check the dirty flags for the VPA and DTL for each vcpu and set the relevant bit in the dirty log if necessary. Doing this also means we now need to keep track of the guest physical address of the VPA/DTL areas. So as not to lose track of modifications to a VPA/DTL area when it gets unregistered, or when a new area gets registered in its place, we need to transfer the dirty state to the rmap chain. This adds code to kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify that code, we now require that all VPA, DTL and SLB shadow buffer areas fit within a single host page. Guests already comply with this requirement because pHyp requires that these areas not cross a 4k boundary. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changesPaul Mackerras
At present, the code that determines whether a HPT entry has changed, and thus needs to be sent to userspace when it is copying the HPT, doesn't consider a hardware update to the reference and change bits (R and C) in the HPT entries to constitute a change that needs to be sent to userspace. This adds code to check for changes in R and C when we are scanning the HPT to find changed entries, and adds code to set the changed flag for the HPTE when we update the R and C bits in the guest view of the HPTE. Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification() function into kvm_book3s_64.h. Current Linux guest kernels don't use the hardware updates of R and C in the HPT, so this change won't affect them. Linux (or other) kernels might in future want to use the R and C bits and have them correctly transferred across when a guest is migrated, so it is better to correct this deficiency. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500: Add e6500 core to Kconfig descriptionMihai Caraman
Add e6500 core to Kconfig description. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500mc: Enable e6500 coresMihai Caraman
Extend processor compatibility names to e6500 cores. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500: Remove E.PT and E.HV.LRAT categories from VCPUsMihai Caraman
Embedded.Page Table (E.PT) category is not supported yet in e6500 kernel. Configure TLBnCFG to remove E.PT and E.HV.LRAT categories from VCPUs. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500: Add support for EPTCFG registerMihai Caraman
EPTCFG register defined by E.PT is accessed unconditionally by Linux guests in the presence of MAV 2.0. Emulate it now. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500: Add support for TLBnPS registersMihai Caraman
Add support for TLBnPS registers available in MMU Architecture Version (MAV) 2.0. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500: Move vcpu's MMU configuration to dedicated functionsMihai Caraman
Vcpu's MMU default configuration and geometry update logic was buried in a chunk of code. Move them to dedicated functions to add more clarity. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: e500: Expose MMU registers via ONE_REGMihai Caraman
MMU registers were exposed to user-space using sregs interface. Add them to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation mechanism. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3E: Refactor ONE_REG ioctl implementationMihai Caraman
Refactor Book3E ONE_REG ioctl implementation to use kvmppc_get_one_reg/ kvmppc_set_one_reg delegation interface introduced by Book3S. This is necessary for MMU SPRs which are platform specifics. Get rid of useless case braces in the process. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26booke: exit to user space if emulator requestBharat Bhushan
This allows the exit to user space if emulator request by returning EMULATE_EXIT_USER. This will be used in subsequent patches in list Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: extend EMULATE_EXIT_USER to support different exit reasonsBharat Bhushan
Currently the instruction emulator code returns EMULATE_EXIT_USER and common code initializes the "run->exit_reason = .." and "vcpu->arch.hcall_needed = .." with one fixed reason. But there can be different reasons when emulator need to exit to user space. To support that the "run->exit_reason = .." and "vcpu->arch.hcall_needed = .." initialization is moved a level up to emulator. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26Rename EMULATE_DO_PAPR to EMULATE_EXIT_USERBharat Bhushan
Instruction emulation return EMULATE_DO_PAPR when it requires exit to userspace on book3s. Similar return is required for booke. EMULATE_DO_PAPR reads out to be confusing so it is renamed to EMULATE_EXIT_USER. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: debug stub interface parameter definedBharat Bhushan
This patch defines the interface parameter for KVM_SET_GUEST_DEBUG ioctl support. Follow up patches will use this for setting up hardware breakpoints, watchpoints and software breakpoints. Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below. This is because I am not sure what is required for book3s. So this ioctl behaviour will not change for book3s. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: cache flush for kernel managed pagesBharat Bhushan
Kernel can only access pages which maps as memory. So flush only the valid kernel pages. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26powerpc/perf: Enable branch stack sampling frameworkAnshuman Khandual
Provides basic enablement for perf branch stack sampling framework on POWER8 processor based platforms. Adds new BHRB related elements into cpu_hw_event structure to represent current BHRB config, BHRB filter configuration, manage context and to hold output BHRB buffer during PMU interrupt before passing to the user space. This also enables processing of BHRB data and converts them into generic perf branch stack data format. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Define BHRB generic functions, data and flags for POWER8Anshuman Khandual
This patch populates BHRB specific data for power_pmu structure. It also implements POWER8 specific BHRB filter and configuration functions. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add new BHRB related generic functions, data and flagsAnshuman Khandual
This patch adds couple of generic functions to power_pmu structure which would configure the BHRB and it's filters. It also adds representation of the number of BHRB entries present on the PMU. A new PMU flag PPMU_BHRB would indicate presence of BHRB feature. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add basic assembly code to read BHRB entries on POWER8Anshuman Khandual
This patch adds the basic assembly code to read BHRB buffer. BHRB entries are valid only after a PMU interrupt has happened (when MMCR0[PMAO]=1) and BHRB has been freezed. BHRB read should not be attempted when it is still enabled (MMCR0[PMAE]=1) and getting updated, as this can produce non-deterministic results. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add new BHRB related instructions for POWER8Anshuman Khandual
This patch adds new POWER8 instruction encoding for reading and clearing Branch History Rolling Buffer entries. The new instruction 'mfbhrbe' (move from branch history rolling buffer entry) is used to read BHRB buffer entries and instruction 'clrbhrb' (clear branch history rolling buffer) is used to clear the entire buffer. The instruction 'clrbhrb' has straight forward encoding. But the instruction encoding format for reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it takes two arguments, i.e the index for the BHRB buffer entry to read and a general purpose register to put the value which was read from the buffer entry. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Power8 PMU supportMichael Ellerman
This patch adds support for the power8 PMU to perf. Work is ongoing to add generic cache events. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add support for SIERMichael Ellerman
On power8 we have a new SIER (Sampled Instruction Event Register), which captures information about instructions when we have random sampling enabled. Add support for loading the SIER into pt_regs, overloading regs->dar. Also set the new NO_SIPR flag in regs->result if we don't have SIPR. Update regs_sihv/sipr() to look for SIPR/SIHV in SIER. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add regs_no_sipr()Michael Ellerman
On power8 the presence or absence of SIPR depends on settings at runtime, so convert to using a dynamic flag for NO_SIPR. Existing backends that set NO_SIPR unconditionally set the dynamic flag obviously. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add an accessor for regs->resultMichael Ellerman
Add an accessor for regs->result so we can use it to store more flags in future. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv()Michael Ellerman
On power8 the SIPR and SIHV are not in MMCRA, so convert the routines to take regs and change the names accordingly. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add an explict flag indicating presence of SLOT fieldMichael Ellerman
In perf_ip_adjust() we potentially use the MMCRA[SLOT] field to adjust the reported IP of a sampled instruction. Currently the logic is written so that if the backend does NOT have the PPMU_ALT_SIPR flag set then we assume MMCRA[SLOT] exists. However on power8 we do not want to set ALT_SIPR (it's in a third location), and we also do not have MMCRA[SLOT]. So add a new flag which only indicates whether MMCRA[SLOT] exists. Naively we'd set it on everything except power6/7, because they set ALT_SIPR, and we've reversed the polarity of the flag. But it's more complicated than that. mpc7450 is 32-bit, and uses its own version of perf_ip_adjust() which doesn't use MMCRA[SLOT], so it doesn't need the new flag set and the behaviour is unchanged. PPC970 (and I assume power4) don't have MMCRA[SLOT], so shouldn't have the new flag set. This is a behaviour change on those cpus, though we were probably getting lucky and the bits in question were 0. power5 and power5+ set the new flag, behaviour unchanged. power6 & power7 do not set the new flag, behaviour unchanged. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc: Initialise PMU related regs on Power8Michael Ellerman
For both HV and guest kernels, intialise PMU regs to something sane. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Fix invalid IOMMU tableGavin Shan
Ben found the root cause. Commit 37f02195bee9c25ce44e25204f40b7961a6d7c9d ("powerpc/pci: fix PCI-e devices rescan issue on powerpc platform") overwrites the IOMMU table of PCI device while enabling PCI device. The patch intends to fix the IOMMU table after that point. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Build DMA space for PE on PHB3Gavin Shan
The patch intends to build 32-bits DMA space for individual PEs on PHB3. The TVE# is recognized by the combo of PE# and fixed bits from DMA address, which is zero for 32-bits DMA space. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: TCE invalidation for PHB3Gavin Shan
The TCE should be invalidated while it's created or free'd. The approach to do that for IODA1 and IODA2 compliant PHBs are different. So the patch differentiate them with different functions called to do that for IODA1 and IODA2 compliant PHBs. It's notable that the PCI address is used to invalidate the corresponding TCE on IODA2 compliant PHB3. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Patch MSI EOI handler on P8Gavin Shan
The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional steps to handle the P/Q bits in IVE before EOIing the corresponding interrupt. The patch changes the EOI handler to cover that. we have individual IRQ chip in each PHB instance. During the MSI IRQ setup time, the IRQ chip is copied over from the original one for that IRQ, and the EOI handler is patched with the one that will handle the P/Q bits (As Ben suggested). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Add option CONFIG_POWERNV_MSIGavin Shan
As Michael Ellerman suggested, to add CONFIG_POWERNV_MSI for PowerNV platform. That's similar to CONFIG_PSERIES_MSI for pSeries platform. For now, we don't make it dependent on CONFIG_EEH since it's not ready to enable that yet. Apart from that, we also enable CONFIG_PPC_MSI_BITMAP on selecting CONFIG_POWERNV_MSI. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Supports PHB3Gavin Shan
The patch intends to initialize PHB3 during system boot stage. The flag "PNV_PHB_MODEL_PHB3" is introduced to differentiate IODA2 compatible PHB3 from other types of PHBs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc: Fix "attempt to move .org backwards" errorPaul Mackerras
Building a 64-bit powerpc kernel with PR KVM enabled currently gives this error: AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1 This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into 33 instructions, but we only have space for 32 at the decrementer interrupt vector (from 0x900 to 0x980). In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we currently have two instances of the HMT_MEDIUM macro, which has the effect of setting the SMT thread priority to medium. One is the first instruction, and is overwritten by a no-op on processors where we save the PPR (processor priority register), that is, POWER7 or later. The other is after we have saved the PPR. In order to reduce the code at 0x900 by one instruction, we omit the first HMT_MEDIUM. On processors without SMT this will have no effect since HMT_MEDIUM is a no-op there. On POWER5 and RS64 machines this will mean that the first few instructions take a little longer in the case where a decrementer interrupt occurs when the hardware thread is running at low SMT priority. On POWER6 and later machines, the hardware automatically boosts the thread priority when a decrementer interrupt is taken if the thread priority was below medium, so this change won't make any difference. The alternative would be to branch out of line after saving the CFAR. However, that would incur an extra overhead on all processors, whereas the approach adopted here only adds overhead on older threaded processors. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Add /proc interface to control topology updatesNathan Fontenot
There are instances in which we do not want topology updates to occur. In order to allow this a /proc interface (/proc/powerpc/topology_updates) is introduced so that topology updates can be enabled and disabled. This patch also adds a prrn_is_enabled() call so that PRRN events are handled in the kernel only if topology updating is enabled. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Enable PRRN handlingNathan Fontenot
The Linux kernel and platform firmware negotiate their mutual support of the PRRN option via the ibm,client-architecture-support interface. This patch simply sets the appropriate fields in the client architecture vector to indicate Linux support for PRRN and will allow the firmware to report PRRN events via the RTAS event-scan mechanism. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: RE-enable Virtual Processor Home Node updatingJesse Larrew
The new PRRN firmware feature provides a more convenient and event-driven interface than VPHN for notifying Linux of changes to the NUMA affinity of platform resources. However, for practical reasons, it may not be feasible for some customers to update to the latest firmware. For these customers, the VPHN feature supported on previous firmware versions may still be the best option. The VPHN feature was previously disabled due to races with the load balancing code when accessing the NUMA cpu maps, but the new stop_machine() approach protects the NUMA cpu maps from these concurrent accesses. It should be safe to re-enable this feature now. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Update NUMA VDSO information when updating CPU mapsJesse Larrew
The following patch adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3: Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3. This patch ensures that this information is also updated when the NUMA affinity of a cpu changes. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Use stop machine to update cpu mapsNathan Fontenot
The new PRRN firmware feature allows CPU and memory resources to be transparently reassigned across NUMA boundaries. When this happens, the kernel must update the node maps to reflect the new affinity information. Although the NUMA maps can be protected by locking primitives during the update itself, this is insufficient to prevent concurrent accesses to these structures. Since cpumask_of_node() hands out a pointer to these structures, they can still be modified outside of the lock. Furthermore, tracking down each usage of these pointers and adding locks would be quite invasive and difficult to maintain. The approach used is to make a list of affected cpus and call stop_machine to have the update routine run on each of the affected cpus allowing them to update themselves. Each cpu finds itself in the list of cpus and makes the appropriate updates. We need to have each cpu do this for themselves to handle calls to vdso_getcpu_init() added in a subsequent patch. Situations like these are best handled using stop_machine(). Since the NUMA affinity updates are exceptionally rare events, this approach has the benefit of not adding any overhead while accessing the NUMA maps during normal operation. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Update CPU maps when device tree is updatedJesse Larrew
Platform events such as partition migration or the new PRRN firmware feature can cause the NUMA characteristics of a CPU to change, and these changes will be reflected in the device tree nodes for the affected CPUs. This patch registers a handler for Open Firmware device tree updates and reconfigures the CPU and node maps whenever the associativity changes. Currently, this is accomplished by marking the affected CPUs in the cpu_associativity_changes_mask and allowing arch_update_cpu_topology() to retrieve the new associativity information using hcall_vphn(). Protecting the NUMA cpu maps from concurrent access during an update operation will be addressed in a subsequent patch in this series. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>