summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2016-09-12KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after ↵Markus Elfring
error detection The kfree() function was called in two cases by the kvm_vcpu_ioctl_config_tlb() function during error handling even if the passed data structure element contained a null pointer. * Split a condition check for memory allocation failures. * Adjust jump targets according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb()Markus Elfring
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Counters for passthrough IRQ statsSuresh Warrier
Add VCPU stat counters to track affinity for passthrough interrupts. pthru_all: Counts all passthrough interrupts whose IRQ mappings are in the kvmppc_passthru_irq_map structure. pthru_host: Counts all cached passthrough interrupts that were injected from the host through kvm_set_irq (i.e. not handled in real mode). pthru_bad_aff: Counts how many cached passthrough interrupts have bad affinity (receiving CPU is not running VCPU that is the target of the virtual interrupt in the guest). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Set server for passed-through interruptsPaul Mackerras
When a guest has a PCI pass-through device with an interrupt, it will direct the interrupt to a particular guest VCPU. In fact the physical interrupt might arrive on any CPU, and then get delivered to the target VCPU in the emulated XICS (guest interrupt controller), and eventually delivered to the target VCPU. Now that we have code to handle device interrupts in real mode without exiting to the host kernel, there is an advantage to having the device interrupt arrive on the same sub(core) as the target VCPU is running on. In this situation, the interrupt can be delivered to the target VCPU without any exit to the host kernel (using a hypervisor doorbell interrupt between threads if necessary). This patch aims to get passed-through device interrupts arriving on the correct core by setting the interrupt server in the real hardware XICS for the interrupt to the first thread in the (sub)core where its target VCPU is running. We do this in the real-mode H_EOI code because the H_EOI handler already needs to look at the emulated ICS state for the interrupt (whereas the H_XIRR handler doesn't), and we know we are running in the target VCPU context at that point. We set the server CPU in hardware using an OPAL call, regardless of what the IRQ affinity mask for the interrupt says, and without updating the affinity mask. This amounts to saying that when an interrupt is passed through to a guest, as a matter of policy we allow the guest's affinity for the interrupt to override the host's. This is inspired by an earlier patch from Suresh Warrier, although none of this code came from that earlier patch. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real modeSuresh Warrier
When a passthrough IRQ is handled completely within KVM real mode code, it has to also update the IRQ stats since this does not go through the generic IRQ handling code. However, the per CPU kstat_irqs field is an allocated (not static) field and so cannot be directly accessed in real mode safely. The function this_cpu_inc_rm() is introduced to safely increment per CPU fields (currently coded for unsigned integers only) that are allocated and could thus be vmalloced also. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypassSuresh Warrier
Add a module parameter kvm_irq_bypass for kvm_hv.ko to disable IRQ bypass for passthrough interrupts. The default value of this tunable is 1 - that is enable the feature. Since the tunable is used by built-in kernel code, we use the module_param_cb macro to achieve this. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Dump irqmap in debugfsSuresh Warrier
Dump the passthrough irqmap structure associated with a guest as part of /sys/kernel/debug/powerpc/kvm-xics-*. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Complete passthrough interrupt in hostSuresh Warrier
In existing real mode ICP code, when updating the virtual ICP state, if there is a required action that cannot be completely handled in real mode, as for instance, a VCPU needs to be woken up, flags are set in the ICP to indicate the required action. This is checked when returning from hypercalls to decide whether the call needs switch back to the host where the action can be performed in virtual mode. Note that if h_ipi_redirect is enabled, real mode code will first try to message a free host CPU to complete this job instead of returning the host to do it ourselves. Currently, the real mode PCI passthrough interrupt handling code checks if any of these flags are set and simply returns to the host. This is not good enough as the trap value (0x500) is treated as an external interrupt by the host code. It is only when the trap value is a hypercall that the host code searches for and acts on unfinished work by calling kvmppc_xics_rm_complete. This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD which is returned by KVM if there is unfinished business to be completed in host virtual mode after handling a PCI passthrough interrupt. The host checks for this special interrupt condition and calls into the kvmppc_xics_rm_complete, which is made an exported function for this reason. [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Handle passthrough interrupts in guestSuresh Warrier
Currently, KVM switches back to the host to handle any external interrupt (when the interrupt is received while running in the guest). This patch updates real-mode KVM to check if an interrupt is generated by a passthrough adapter that is owned by this guest. If so, the real mode KVM will directly inject the corresponding virtual interrupt to the guest VCPU's ICS and also EOI the interrupt in hardware. In short, the interrupt is handled entirely in real mode in the guest context without switching back to the host. In some rare cases, the interrupt cannot be completely handled in real mode, for instance, a VCPU that is sleeping needs to be woken up. In this case, KVM simply switches back to the host with trap reason set to 0x500. This works, but it is clearly not very efficient. A following patch will distinguish this case and handle it correctly in the host. Note that we can use the existing check_too_hard() routine even though we are not in a hypercall to determine if there is unfinished business that needs to be completed in host virtual mode. The patch assumes that the mapping between hardware interrupt IRQ and virtual IRQ to be injected to the guest already exists for the PCI passthrough interrupts that need to be handled in real mode. If the mapping does not exist, KVM falls back to the default existing behavior. The KVM real mode code reads mappings from the mapped array in the passthrough IRQ map without taking any lock. We carefully order the loads and stores of the fields in the kvmppc_irq_map data structure using memory barriers to avoid an inconsistent mapping being seen by the reader. Thus, although it is possible to miss a map entry, it is not possible to read a stale value. [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap, pulled out powernv eoi change into a separate patch, made kvmppc_read_intr get the vcpu from the paca rather than being passed in, rewrote the logic at the end of kvmppc_read_intr to avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S since we were always restoring SRR0/1 anyway, get rid of the cached array (just use the mapped array), removed the kick_all_cpus_sync() call, clear saved_xirr PACA field when we handle the interrupt in real mode, fix compilation with CONFIG_KVM_XICS=n.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-10powerpc/cell: Drop unused iic_get_irq_host()Daniel Axtens
Sparse checking revealed that it is no longer used. The last usage was removed in commit 2e194583125b ("[POWERPC] Cell interrupt rework") in 2006. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-09Merge tag 'powerpc-4.8-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes marked for stable: - Don't alias user region to other regions below PAGE_OFFSET from Paul Mackerras - Fix again csum_partial_copy_generic() on 32-bit from Christophe Leroy - Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan Fixes for code merged this cycle: - Fix crash on releasing compound PE from Gavin Shan - Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt - Fix little endian build with CONFIG_KEXEC=n from Thiago Jung Bauermann" * tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET powerpc/32: Fix again csum_partial_copy_generic() powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE powerpc/powernv: Fix crash on releasing compound PE powerpc/xics/opal: Fix processor numbers in OPAL ICP powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
2016-09-09KVM: PPC: Book3S HV: Enable IRQ bypassSuresh Warrier
Add the irq_bypass_add_producer and irq_bypass_del_producer functions. These functions get called whenever a GSI is being defined for a guest. They create/remove the mapping between host real IRQ numbers and the guest GSI. Add the following helper functions to manage the passthrough IRQ map. kvmppc_set_passthru_irq() Creates a mapping in the passthrough IRQ map that maps a host IRQ to a guest GSI. It allocates the structure (one per guest VM) the first time it is called. kvmppc_clr_passthru_irq() Removes the passthrough IRQ map entry given a guest GSI. The passthrough IRQ map structure is not freed even when the number of mapped entries goes to zero. It is only freed when the VM is destroyed. [paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than requiring all passed-through interrupts to use the same irq_chip; changed deletion so it zeroes out the r_hwirq field rather than copying the last entry down and decrementing the number of entries.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmapSuresh Warrier
This patch introduces an IRQ mapping structure, the kvmppc_passthru_irqmap structure that is to be used to map the real hardware IRQ in the host with the virtual hardware IRQ (gsi) that is injected into a guest by KVM for passthrough adapters. Currently, we assume a separate IRQ mapping structure for each guest. Each kvmppc_passthru_irqmap has a mapping arrays, containing all defined real<->virtual IRQs. [paulus@ozlabs.org - removed irq_chip field from struct kvmppc_passthru_irqmap; changed parameter for kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct kvm *, removed small cached array.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09KVM: PPC: select IRQ_BYPASS_MANAGERSuresh Warrier
Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set. Add the PPC producer functions for add and del producer. [paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c so booke compiles; added kvm_arch_has_irq_bypass implementation.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C functionSuresh Warrier
Modify kvmppc_read_intr to make it a C function. Because it is called from kvmppc_check_wake_reason, any of the assembler code that calls either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume that the volatile registers might have been modified. This also adds in the optimization of clearing saved_xirr in the case where we completely handle and EOI an IPI. Without this, the next device interrupt will require two trips through the host interrupt handling code. [paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame when it is calling kvmppc_read_intr, which means we can set r12 to the trap number (0x500) after the call to kvmppc_read_intr, instead of using r31. Also moved the deliver_guest_interrupt label so as to restore XER and CTR, plus other minor tweaks.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09Merge branch 'kvm-ppc-infrastructure' into kvm-ppc-nextPaul Mackerras
This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next so that I can then apply further patches that need the changes in the kvm-ppc-infrastructure branch. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09powerpc: move hmi.c to arch/powerpc/kvm/Paolo Bonzini
hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09powerpc/powernv: Provide facilities for EOI, usable from real modeSuresh Warrier
This adds a new function pnv_opal_pci_msi_eoi() which does the part of end-of-interrupt (EOI) handling of an MSI which involves doing an OPAL call. This function can be called in real mode. This doesn't just export pnv_ioda2_msi_eoi() because that does a call to icp_native_eoi(), which does not work in real mode. This also adds a function, is_pnv_opal_msi(), which KVM can call to check whether an interrupt is one for which we should be calling pnv_opal_pci_msi_eoi() when we need to do an EOI. [paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi() from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough interrupts in guest"; added is_pnv_opal_msi(); wrote description.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09powerpc: Add simple cache inhibited MMIO accessorsSuresh Warrier
Add simple cache inhibited accessors for memory mapped I/O. Unlike the accessors built from the DEF_MMIO_* macros, these don't include any hardware memory barriers, callers need to manage memory barriers on their own. These can only be called in hypervisor real mode. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> [paulus@ozlabs.org - added line to comment] Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09powerpc/mm: Speed up computation of base and actual page size for a HPTEPaul Mackerras
This replaces a 2-D search through an array with a simple 8-bit table lookup for determining the actual and/or base page size for a HPT entry. The encoding in the second doubleword of the HPTE is designed to encode the actual and base page sizes without using any more bits than would be needed for a 4k page number, by using between 1 and 8 low-order bits of the RPN (real page number) field to encode the page sizes. A single "large page" bit in the first doubleword indicates that these low-order bits are to be interpreted like this. We can determine the page sizes by using the low-order 8 bits of the RPN to look up a 256-entry table. For actual page sizes less than 1MB, some of the upper bits of these 8 bits are going to be real address bits, but we can cope with that by replicating the entries for those smaller page sizes. While we're at it, let's move the hpte_page_size() and hpte_base_page_size() functions from a KVM-specific header to a header for 64-bit HPT systems, since this computation doesn't have anything specifically to do with KVM. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08powerpc/mm: Don't alias user region to other regions below PAGE_OFFSETPaul Mackerras
In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13) we lost a check on the region number (the top four bits of the effective address) for addresses below PAGE_OFFSET. That commit replaced a check that the top 18 bits were all zero with a check that bits 46 - 59 were zero (performed for all addresses, not just user addresses). This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx and we will insert a valid SLB entry for it. The VSID used will be the same as if the top 4 bits were 0, but the page size will be some random value obtained by indexing beyond the end of the mm_ctx_high_slices_psize array in the paca. If that page size is the same as would be used for region 0, then userspace just has an alias of the region 0 space. If the page size is different, then no HPTE will be found for the access, and the process will get a SIGSEGV (since hash_page_mm() will refuse to create a HPTE for the bogus address). The access beyond the end of the mm_ctx_high_slices_psize can be at most 5.5MB past the array, and so will be in RAM somewhere. Since the access is a load performed in real mode, it won't fault or crash the kernel. At most this bug could perhaps leak a little bit of information about blocks of 32 bytes of memory located at offsets of i * 512kB past the paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11. Fixes: c60ac5693c47 ("powerpc: Update kernel VSID range") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-08powerpc/32: Fix again csum_partial_copy_generic()Christophe Leroy
Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") introduced a bug when destination address is odd and len is lower than cacheline size. In that case the resulting csum value doesn't have to be rotated one byte because the cache-aligned copy part is skipped so no alignment is performed. Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") Cc: stable@vger.kernel.org # v4.6+ Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-08powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PEGavin Shan
In pnv_ioda_free_pe(), the PE object (including the associated PE number) is cleared before resetting the corresponding bit in the PE allocation bitmap. It means PE#0 is always released to the bitmap wrongly. This fixes above issue by caching the PE number before the PE object is cleared. Fixes: 1e9167726c41 ("powerpc/powernv: Use PE instead of number during setup and release" Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-08KVM: PPC: Implement existing and add new halt polling vcpu statsSuraj Jitindar Singh
vcpu stats are used to collect information about a vcpu which can be viewed in the debugfs. For example halt_attempted_poll and halt_successful_poll are used to keep track of the number of times the vcpu attempts to and successfully polls. These stats are currently not used on powerpc. Implement incrementation of the halt_attempted_poll and halt_successful_poll vcpu stats for powerpc. Since these stats are summed over all the vcpus for all running guests it doesn't matter which vcpu they are attributed to, thus we choose the current runner vcpu of the vcore. Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns to be used to accumulate the total time spend polling successfully, polling unsuccessfully and waiting respectively, and halt_successful_wait to accumulate the number of times the vcpu waits. Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are expressed in nanoseconds it is necessary to represent these as 64-bit quantities, otherwise they would overflow after only about 4 seconds. Given that the total time spend either polling or waiting will be known and the number of times that each was done, it will be possible to determine the average poll and wait times. This will give the ability to tune the kvm module parameters based on the calculated average wait and poll times. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08KVM: Add provisioning for ulong vm stats and u64 vcpu statsSuraj Jitindar Singh
vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08KVM: PPC: Book3S HV: Implement halt pollingSuraj Jitindar Singh
This patch introduces new halt polling functionality into the kvm_hv kernel module. When a vcore is idle it will poll for some period of time before scheduling itself out. When all of the runnable vcpus on a vcore have ceded (and thus the vcore is idle) we schedule ourselves out to allow something else to run. In the event that we need to wake up very quickly (for example an interrupt arrives), we are required to wait until we get scheduled again. Implement halt polling so that when a vcore is idle, and before scheduling ourselves, we poll for vcpus in the runnable_threads list which have pending exceptions or which leave the ceded state. If we poll successfully then we can get back into the guest very quickly without ever scheduling ourselves, otherwise we schedule ourselves out as before. There exists generic halt_polling code in virt/kvm_main.c, however on powerpc the polling conditions are different to the generic case. It would be nice if we could just implement an arch specific kvm_check_block() function, but there is still other arch specific things which need to be done for kvm_hv (for example manipulating vcore states) which means that a separate implementation is the best option. Testing of this patch with a TCP round robin test between two guests with virtio network interfaces has found a decrease in round trip time of ~15us on average. A performance gain is only seen when going out of and back into the guest often and quickly, otherwise there is no net benefit from the polling. The polling interval is adjusted such that when we are often scheduled out for long periods of time it is reduced, and when we often poll successfully it is increased. The rate at which the polling interval increases or decreases, and the maximum polling interval, can be set through module parameters. Based on the implementation in the generic kvm module by Wanpeng Li and Paolo Bonzini, and on direction from Paul Mackerras. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08KVM: PPC: Book3S HV: Change vcore element runnable_threads from linked-list ↵Suraj Jitindar Singh
to array The struct kvmppc_vcore is a structure used to store various information about a virtual core for a kvm guest. The runnable_threads element of the struct provides a list of all of the currently runnable vcpus on the core (those in the KVMPPC_VCPU_RUNNABLE state). The previous implementation of this list was a linked_list. The next patch requires that the list be able to be iterated over without holding the vcore lock. Reimplement the runnable_threads list in the kvmppc_vcore struct as an array. Implement function to iterate over valid entries in the array and update access sites accordingly. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08KVM: PPC: Book3S HV: Move struct kvmppc_vcore from kvm_host.h to kvm_book3s.hSuraj Jitindar Singh
The next commit will introduce a member to the kvmppc_vcore struct which references MAX_SMT_THREADS which is defined in kvm_book3s_asm.h, however this file isn't included in kvm_host.h directly. Thus compiling for certain platforms such as pmac32_defconfig and ppc64e_defconfig with KVM fails due to MAX_SMT_THREADS not being defined. Move the struct kvmppc_vcore definition to kvm_book3s.h which explicitly includes kvm_book3s_asm.h. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-06usercopy: fold builtin_const check into inline functionKees Cook
Instead of having each caller of check_object_size() need to remember to check for a const size parameter, move the check into check_object_size() itself. This actually matches the original implementation in PaX, though this commit cleans up the now-redundant builtin_const() calls in the various architectures. Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-06powerpc/mmu nohash: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: rt@linutronix.de Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160818125731.27256-17-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06powerpc/powermac: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. I assume here that the powermac has two CPUs and so only one can go up or down at a time. The variable smp_core99_host_open is here to ensure that we do not try to open or close the i2c host twice if something goes wrong and we invoke the prepare or online callback twice due to rollback. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: rt@linutronix.de Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160818125731.27256-16-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06powerpc/powernv: Fix crash on releasing compound PEGavin Shan
The compound PE is created to accommodate the devices attached to one specific PCI bus that consume multiple M64 segments. The compound PE is made up of one master PE and possibly multiple slave PEs. The slave PEs should be destroyed when releasing the master PE. A kernel crash happens when derferencing @pe->pdev on releasing the slave PE in pnv_ioda_deconfigure_pe(). # echo 0 > /sys/bus/pci/slots/C7/power iommu: Removing device 0000:01:00.1 from group 0 iommu: Removing device 0000:01:00.0 from group 0 Unable to handle kernel paging request for data at address 0x00000010 Faulting instruction address: 0xc00000000005d898 cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620] pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610 lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610 sp: c000000fe82178a0 msr: 9000000000009033 dar: 10 dsisr: 40000000 current = 0xc000000fe815ab80 paca = 0xc00000000ff00400 softe: 0 irq_happened: 0x01 pid = 2709, comm = sh Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \ (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \ Tue Sep 6 13:37:29 AEST 2016 enter ? for help [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610 [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70 [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0 [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0 [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0 [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40 [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150 [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0 [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0 [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190 [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60 [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0 [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260 [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190 [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240 [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110 [c000000fe8217e30] c000000000009524 system_call+0x38/0x108 It fixes the kernel crash by bypassing releasing resources (DMA, IO and memory segments, PELTM) because there are no resources assigned to the slave PE. Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE") Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-06powerpc/xics/opal: Fix processor numbers in OPAL ICPBenjamin Herrenschmidt
When using the OPAL ICP backend we incorrectly pass Linux CPU numbers rather than HW CPU numbers to OPAL. Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-06powerpc/pseries: Fix little endian build with CONFIG_KEXEC=nThiago Jung Bauermann
On ppc64le, builds with CONFIG_KEXEC=n fail with: arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’: arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’ if (rc && !kdump_in_progress()) This is because pseries/setup.c includes <linux/kexec.h>, but kdump_in_progress() is defined in <asm/kexec.h>. This is a problem because the former only includes the latter if CONFIG_KEXEC_CORE=y. Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c. Fixes: d3cbff1b5a90 ("powerpc: Put exception configuration in a common place") Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-29powerpc: signals: Discard transaction state from signal framesCyril Bur
Userspace can begin and suspend a transaction within the signal handler which means they might enter sys_rt_sigreturn() with the processor in suspended state. sys_rt_sigreturn() wants to restore process context (which may have been in a transaction before signal delivery). To do this it must restore TM SPRS. To achieve this, any transaction initiated within the signal frame must be discarded in order to be able to restore TM SPRs as TM SPRs can only be manipulated non-transactionally.. >From the PowerPC ISA: TM Bad Thing Exception [Category: Transactional Memory] An attempt is made to execute a mtspr targeting a TM register in other than Non-transactional state. Not doing so results in a TM Bad Thing: [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable] [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033) [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1] [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34 [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000 [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000 [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0) [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000 [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033 GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0 GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000 GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000 GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0 [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0 [12045.223630] Call Trace: [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0 [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108 [12045.223806] Instruction dump: [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 [12045.224074] ---[ end trace cb8002ee240bae76 ]--- It isn't clear exactly if there is really a use case for userspace returning with a suspended transaction, however, doing so doesn't (on its own) constitute a bad frame. As such, this patch simply discards the transactional state of the context calling the sigreturn and continues. Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Acked-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-29powerpc/powernv : Drop reference added by kset_find_obj()Mukesh Ojha
In a situation, where Linux kernel gets notified about duplicate error log from OPAL, it is been observed that kernel fails to remove sysfs entries (/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because, we currently search the error log/dump kobject in the kset list via 'kset_find_obj()' routine. Which eventually increment the reference count by one, once it founds the kobject. So, unless we decrement the reference count by one after it found the kobject, we would not be able to release the kobject properly later. This patch adds the 'kobject_put()' which was missing earlier. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-29powerpc/tm: do not use r13 for tabort_syscallNicholas Piggin
tabort_syscall runs with RI=1, so a nested recoverable machine check will load the paca into r13 and overwrite what we loaded it with, because exceptions returning to privileged mode do not restore r13. Fixes: b4b56f9ecab4 (powerpc/tm: Abort syscalls in active transactions) Cc: stable@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-25KVM: PPC: Always select KVM_VFIO, plus Makefile cleanupPaul Mackerras
As discussed recently on the kvm mailing list, David Gibson's intention in commit 178a78750212 ("vfio: Enable VFIO device for powerpc", 2016-02-01) was to have the KVM VFIO device built in on all powerpc platforms. This patch adds the "select KVM_VFIO" statement that makes this happen. Currently, arch/powerpc/kvm/Makefile doesn't include vfio.o for the 64-bit kvm module, because the list of objects doesn't use the $(common-objs-y) list. The reason it doesn't is because we don't necessarily want coalesced_mmio.o or emulate.o (for example if HV KVM is the only target), and common-objs-y includes both. Since this is confusing, this patch adjusts the definitions so that we now use $(common-objs-y) in the list for the 64-bit kvm.ko module, emulate.o is removed from common-objs-y and added in the places that need it, and the inclusion of coalesced_mmio.o now depends on CONFIG_KVM_MMIO. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-08-24ftrace: Add return address pointer to ftrace_ret_stackJosh Poimboeuf
Storing this value will help prevent unwinders from getting out of sync with the function graph tracer ret_stack. Now instead of needing a stateful iterator, they can compare the return address pointer to find the right ret_stack entry. Note that an array of 50 ftrace_ret_stack structs is allocated for every task. So when an arch implements this, it will add either 200 or 400 bytes of memory usage per task (depending on whether it's a 32-bit or 64-bit platform). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-22powerpc: move hmi.c to arch/powerpc/kvm/Paolo Bonzini
hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc: sysdev: cpm: fix gpio save_regs functionsChristophe Leroy
of_mm_gpiochip_add_data() calls mm_gc->save_regs() before setting the data. Therefore ->save_regs() cannot use gpiochip_get_data() [ 0.275940] Unable to handle kernel paging request for data at address 0x00000130 [ 0.283120] Faulting instruction address: 0xc01b44cc [ 0.288175] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.293343] PREEMPT CMPC885 [ 0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty #68 [ 0.304131] task: c6074000 ti: c6080000 task.ti: c6080000 [ 0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708 [ 0.314372] REGS: c6081d90 TRAP: 0300 Not tainted (4.7.0-g65124df-dirty) [ 0.322267] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24000028 XER: 20000000 [ 0.328813] DAR: 00000130 DSISR: c0000000 GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 00000000 GPR08: c601d028 00000000 ffffffff 00000001 24000044 00000000 c0002790 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c05643b0 00000083 GPR24: c04a1a6c c0560000 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000 [ 0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc [ 0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44 [ 0.370972] Call Trace: [ 0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc [ 0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118 [ 0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c [ 0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc [ 0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110 [ 0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64 [ 0.408233] Instruction dump: [ 0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 712a0004 [ 0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 7c0802a6 9421ffe0 [ 0.426763] ---[ end trace fe4113ee21d72ffa ]--- fixes: e65078f1f3490 ("powerpc: sysdev: cpm1: use gpiochip data pointer") fixes: a14a2d484b386 ("powerpc: cpm_common: use gpiochip data pointer") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc/pseries: PACA save area fix for MCE vs MCENicholas Piggin
MCE must not enable MSR_RI until PACA_EXMC is no longer being used. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc/pseries: PACA save area fix for general exception vs MCENicholas Piggin
MCE must not use PACA_EXGEN. When a general exception enables MSR_RI, that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the PACA save area is still in use. Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc/prom: Fix sub-processor option passed to ibm, ↵Michael Ellerman
client-architecture-support When booting from an OpenFirmware which supports it, we use the "ibm,client-architecture-support" firmware call to communicate our capabilities to firmware. The format of the structure we pass to firmware is specified in PAPR (Power Architecture Platform Requirements), or the public version LoPAPR (Linux on Power Architecture Platform Reference). Referring to table 244 in LoPAPR v1.1, option vector 5 contains a 4 byte field at bytes 17-20 for the "Platform Facilities Enable". This is followed by a 1 byte field at byte 21 for "Sub-Processor Represenation Level". Comparing to the code, there we have the Platform Facilities options (OV5_PFO_*) at byte 17, but we fail to pad that field out to its full width of 4 bytes. This means the OV5_SUB_PROCESSORS option is incorrectly placed at byte 18. Fix it by adding zero bytes for bytes 18, 19, 20, and comment the bytes to hopefully make it clearer in future. As far as I'm aware nothing actually consumes this value at this time, so the effect of this bug is nil in practice. It does mean we've been incorrectly setting bit 15 of the "Platform Facilities Enable" option for the past ~3 1/2 years, so we should avoid allocating that bit to anything else in future. Fixes: df77c7992029 ("powerpc/pseries: Update ibm,architecture.vec for PAPR 2.7/POWER8") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc, hotplug: Avoid to touch non-existent cpumasks.Boqun Feng
We observed a kernel oops when running a PPC guest with config NR_CPUS=4 and qemu option "-smp cores=1,threads=8": [ 30.634781] Unable to handle kernel paging request for data at address 0xc00000014192eb17 [ 30.636173] Faulting instruction address: 0xc00000000003e5cc [ 30.637069] Oops: Kernel access of bad area, sig: 11 [#1] [ 30.637877] SMP NR_CPUS=4 NUMA pSeries [ 30.638471] Modules linked in: [ 30.638949] CPU: 3 PID: 27 Comm: migration/3 Not tainted 4.7.0-07963-g9714b26 #1 [ 30.640059] task: c00000001e29c600 task.stack: c00000001e2a8000 [ 30.640956] NIP: c00000000003e5cc LR: c00000000003e550 CTR: 0000000000000000 [ 30.642001] REGS: c00000001e2ab8e0 TRAP: 0300 Not tainted (4.7.0-07963-g9714b26) [ 30.643139] MSR: 8000000102803033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22004084 XER: 00000000 [ 30.644583] CFAR: c000000000009e98 DAR: c00000014192eb17 DSISR: 40000000 SOFTE: 0 GPR00: c00000000140a6b8 c00000001e2abb60 c0000000016dd300 0000000000000003 GPR04: 0000000000000000 0000000000000004 c0000000016e5920 0000000000000008 GPR08: 0000000000000004 c00000014192eb17 0000000000000000 0000000000000020 GPR12: c00000000140a6c0 c00000000ffffc00 c0000000000d3ea8 c00000001e005680 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 c00000001e6b3a00 0000000000000000 0000000000000001 GPR24: c00000001ff85138 c00000001ff85130 000000001eb6f000 0000000000000001 GPR28: 0000000000000000 c0000000017014e0 0000000000000000 0000000000000018 [ 30.653882] NIP [c00000000003e5cc] __cpu_disable+0xcc/0x190 [ 30.654713] LR [c00000000003e550] __cpu_disable+0x50/0x190 [ 30.655528] Call Trace: [ 30.655893] [c00000001e2abb60] [c00000000003e550] __cpu_disable+0x50/0x190 (unreliable) [ 30.657280] [c00000001e2abbb0] [c0000000000aca0c] take_cpu_down+0x5c/0x100 [ 30.658365] [c00000001e2abc10] [c000000000163918] multi_cpu_stop+0x1a8/0x1e0 [ 30.659617] [c00000001e2abc60] [c000000000163cc0] cpu_stopper_thread+0xf0/0x1d0 [ 30.660737] [c00000001e2abd20] [c0000000000d8d70] smpboot_thread_fn+0x290/0x2a0 [ 30.661879] [c00000001e2abd80] [c0000000000d3fa8] kthread+0x108/0x130 [ 30.662876] [c00000001e2abe30] [c000000000009968] ret_from_kernel_thread+0x5c/0x74 [ 30.664017] Instruction dump: [ 30.664477] 7bde1f24 38a00000 787f1f24 3b600001 39890008 7d204b78 7d05e214 7d0b07b4 [ 30.665642] 796b1f24 7d26582a 7d204a14 7d29f214 <7d4048a8> 7d4a3878 7d4049ad 40c2fff4 [ 30.666854] ---[ end trace 32643b7195717741 ]--- The reason of this is that in __cpu_disable(), when we try to set the cpu_sibling_mask or cpu_core_mask of the sibling CPUs of the disabled one, we don't check whether the current configuration employs those sibling CPUs(hw threads). And if a CPU is not employed by a configuration, the percpu structures cpu_{sibling,core}_mask are not allocated, therefore accessing those cpumasks will result in problems as above. This patch fixes this problem by adding an addition check on whether the id is no less than nr_cpu_ids in the sibling CPU iteration code. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc: migrate exception table users off module.h and onto extable.hPaul Gortmaker
These files were only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile these files. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc/powernv/pci: fix iterator signednessAndrzej Hajda
Unsigned type is always non-negative, so the loop could not end in case condition is never true. The problem has been detected using semantic patch scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)Mauricio Faria de Oliveira
This patch leverages 'struct pci_host_bridge' from the PCI subsystem in order to free the pci_controller only after the last reference to its devices is dropped (avoiding an oops in pcibios_release_device() if the last reference is dropped after pcibios_free_controller()). The patch relies on pci_host_bridge.release_fn() (and .release_data), which is called automatically by the PCI subsystem when the root bus is released (i.e., the last reference is dropped). Those fields are set via pci_set_host_bridge_release() (e.g. in the platform-specific implementation of pcibios_root_bridge_prepare()). It introduces the 'pcibios_free_controller_deferred()' .release_fn() and it expects .release_data to hold a pointer to the pci_controller. The function implictly calls 'pcibios_free_controller()', so an user must *NOT* explicitly call it if using the new _deferred() callback. The functionality is enabled for pseries (although it isn't platform specific, and may be used by cxl). Details on not-so-elegant design choices: - Use 'pci_host_bridge.release_data' field as pointer to associated 'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)' in pcibios_free_controller_deferred(). That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL' (so, if the last reference is released after pci_remove_root_bus() runs, which eventually reaches pcibios_free_controller_deferred(), that would hit a null pointer dereference). The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks are interested in this fix. Test-case #1 (hold references) # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0 <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...> # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1 <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...> # cat >/dev/sdaa & pid1=$! # cat >/dev/sdab & pid2=$! # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r Validating PHB DLPAR capability...yes. [ 594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 [ 594.306738] pci_hp_remove_devices: Removing 0021:01:00.0... ... [ 598.236381] pci_hp_remove_devices: Removing 0021:01:00.1... ... [ 611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 611.972140] rpadlpar_io: slot PHB 33 removed # kill -9 $pid1 # kill -9 $pid2 [ 632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1 Test-case #2 (don't hold references) # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r Validating PHB DLPAR capability...yes. [ 916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 [ 916.357386] pci_hp_remove_devices: Removing 0021:01:00.0... ... [ 920.566527] pci_hp_remove_devices: Removing 0021:01:00.1... ... [ 933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1 [ 933.955999] rpadlpar_io: slot PHB 33 removed Suggested-By: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"Markus Elfring
The field "owner" is set by the core. Thus delete an unneeded initialisation. Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22powerpc/512x: Delete unnecessary assignment for the field "owner"Markus Elfring
The field "owner" is set by the core. Thus delete an unneeded initialisation. Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>