Age | Commit message (Collapse) | Author |
|
error detection
The kfree() function was called in two cases by the
kvm_vcpu_ioctl_config_tlb() function during error handling
even if the passed data structure element contained a null pointer.
* Split a condition check for memory allocation failures.
* Adjust jump targets according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Add VCPU stat counters to track affinity for passthrough
interrupts.
pthru_all: Counts all passthrough interrupts whose IRQ mappings are
in the kvmppc_passthru_irq_map structure.
pthru_host: Counts all cached passthrough interrupts that were injected
from the host through kvm_set_irq (i.e. not handled in
real mode).
pthru_bad_aff: Counts how many cached passthrough interrupts have
bad affinity (receiving CPU is not running VCPU that is
the target of the virtual interrupt in the guest).
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU. In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.
Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on. In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).
This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running. We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.
We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask. This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.
This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
When a passthrough IRQ is handled completely within KVM real
mode code, it has to also update the IRQ stats since this
does not go through the generic IRQ handling code.
However, the per CPU kstat_irqs field is an allocated (not static)
field and so cannot be directly accessed in real mode safely.
The function this_cpu_inc_rm() is introduced to safely increment
per CPU fields (currently coded for unsigned integers only) that
are allocated and could thus be vmalloced also.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Add a module parameter kvm_irq_bypass for kvm_hv.ko to
disable IRQ bypass for passthrough interrupts. The default
value of this tunable is 1 - that is enable the feature.
Since the tunable is used by built-in kernel code, we use
the module_param_cb macro to achieve this.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Dump the passthrough irqmap structure associated with a
guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
In existing real mode ICP code, when updating the virtual ICP
state, if there is a required action that cannot be completely
handled in real mode, as for instance, a VCPU needs to be woken
up, flags are set in the ICP to indicate the required action.
This is checked when returning from hypercalls to decide whether
the call needs switch back to the host where the action can be
performed in virtual mode. Note that if h_ipi_redirect is enabled,
real mode code will first try to message a free host CPU to
complete this job instead of returning the host to do it ourselves.
Currently, the real mode PCI passthrough interrupt handling code
checks if any of these flags are set and simply returns to the host.
This is not good enough as the trap value (0x500) is treated as an
external interrupt by the host code. It is only when the trap value
is a hypercall that the host code searches for and acts on unfinished
work by calling kvmppc_xics_rm_complete.
This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
which is returned by KVM if there is unfinished business to be
completed in host virtual mode after handling a PCI passthrough
interrupt. The host checks for this special interrupt condition
and calls into the kvmppc_xics_rm_complete, which is made an
exported function for this reason.
[paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.
In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.
The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.
The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock. We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.
[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
pulled out powernv eoi change into a separate patch, made
kvmppc_read_intr get the vcpu from the paca rather than being
passed in, rewrote the logic at the end of kvmppc_read_intr to
avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
since we were always restoring SRR0/1 anyway, get rid of the cached
array (just use the mapped array), removed the kick_all_cpus_sync()
call, clear saved_xirr PACA field when we handle the interrupt in
real mode, fix compilation with CONFIG_KVM_XICS=n.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Sparse checking revealed that it is no longer used. The last usage was
removed in commit 2e194583125b ("[POWERPC] Cell interrupt rework") in
2006.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Don't alias user region to other regions below PAGE_OFFSET from
Paul Mackerras
- Fix again csum_partial_copy_generic() on 32-bit from Christophe
Leroy
- Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
Fixes for code merged this cycle:
- Fix crash on releasing compound PE from Gavin Shan
- Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
- Fix little endian build with CONFIG_KEXEC=n from Thiago Jung
Bauermann"
* tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
powerpc/32: Fix again csum_partial_copy_generic()
powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
powerpc/powernv: Fix crash on releasing compound PE
powerpc/xics/opal: Fix processor numbers in OPAL ICP
powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
|
|
Add the irq_bypass_add_producer and irq_bypass_del_producer
functions. These functions get called whenever a GSI is being
defined for a guest. They create/remove the mapping between
host real IRQ numbers and the guest GSI.
Add the following helper functions to manage the
passthrough IRQ map.
kvmppc_set_passthru_irq()
Creates a mapping in the passthrough IRQ map that maps a host
IRQ to a guest GSI. It allocates the structure (one per guest VM)
the first time it is called.
kvmppc_clr_passthru_irq()
Removes the passthrough IRQ map entry given a guest GSI.
The passthrough IRQ map structure is not freed even when the
number of mapped entries goes to zero. It is only freed when
the VM is destroyed.
[paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
requiring all passed-through interrupts to use the same irq_chip;
changed deletion so it zeroes out the r_hwirq field rather than
copying the last entry down and decrementing the number of entries.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This patch introduces an IRQ mapping structure, the
kvmppc_passthru_irqmap structure that is to be used
to map the real hardware IRQ in the host with the virtual
hardware IRQ (gsi) that is injected into a guest by KVM for
passthrough adapters.
Currently, we assume a separate IRQ mapping structure for
each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
containing all defined real<->virtual IRQs.
[paulus@ozlabs.org - removed irq_chip field from struct
kvmppc_passthru_irqmap; changed parameter for
kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
kvm *, removed small cached array.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
Add the PPC producer functions for add and del producer.
[paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
so booke compiles; added kvm_arch_has_irq_bypass implementation.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Modify kvmppc_read_intr to make it a C function. Because it is called
from kvmppc_check_wake_reason, any of the assembler code that calls
either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
that the volatile registers might have been modified.
This also adds in the optimization of clearing saved_xirr in the case
where we completely handle and EOI an IPI. Without this, the next
device interrupt will require two trips through the host interrupt
handling code.
[paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
when it is calling kvmppc_read_intr, which means we can set r12 to
the trap number (0x500) after the call to kvmppc_read_intr, instead
of using r31. Also moved the deliver_guest_interrupt label so as to
restore XER and CTR, plus other minor tweaks.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next
so that I can then apply further patches that need the changes in the
kvm-ppc-infrastructure branch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use. So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call. This function can be called in real mode. This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.
This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.
[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Add simple cache inhibited accessors for memory mapped I/O.
Unlike the accessors built from the DEF_MMIO_* macros, these
don't include any hardware memory barriers, callers need to
manage memory barriers on their own. These can only be called
in hypervisor real mode.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
[paulus@ozlabs.org - added line to comment]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.
The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes. A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.
We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table. For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.
While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET. That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).
This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it. The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca. If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space. If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).
The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere. Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
Fixes: c60ac5693c47 ("powerpc: Update kernel VSID range")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination address
is odd and len is lower than cacheline size.
In that case the resulting csum value doesn't have to be rotated one
byte because the cache-aligned copy part is skipped so no alignment
is performed.
Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In pnv_ioda_free_pe(), the PE object (including the associated PE
number) is cleared before resetting the corresponding bit in the
PE allocation bitmap. It means PE#0 is always released to the bitmap
wrongly.
This fixes above issue by caching the PE number before the PE object
is cleared.
Fixes: 1e9167726c41 ("powerpc/powernv: Use PE instead of number during setup and release"
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
vcpu stats are used to collect information about a vcpu which can be viewed
in the debugfs. For example halt_attempted_poll and halt_successful_poll
are used to keep track of the number of times the vcpu attempts to and
successfully polls. These stats are currently not used on powerpc.
Implement incrementation of the halt_attempted_poll and
halt_successful_poll vcpu stats for powerpc. Since these stats are summed
over all the vcpus for all running guests it doesn't matter which vcpu
they are attributed to, thus we choose the current runner vcpu of the
vcore.
Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and
halt_wait_ns to be used to accumulate the total time spend polling
successfully, polling unsuccessfully and waiting respectively, and
halt_successful_wait to accumulate the number of times the vcpu waits.
Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are
expressed in nanoseconds it is necessary to represent these as 64-bit
quantities, otherwise they would overflow after only about 4 seconds.
Given that the total time spend either polling or waiting will be known and
the number of times that each was done, it will be possible to determine
the average poll and wait times. This will give the ability to tune the kvm
module parameters based on the calculated average wait and poll times.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
vms and vcpus have statistics associated with them which can be viewed
within the debugfs. Currently it is assumed within the vcpu_stat_get() and
vm_stat_get() functions that all of these statistics are represented as
u32s, however the next patch adds some u64 vcpu statistics.
Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
Since vcpu statistics are per vcpu, they will only be updated by a single
vcpu at a time so this shouldn't present a problem on 32-bit machines
which can't atomically increment 64-bit numbers. However vm statistics
could potentially be updated by multiple vcpus from that vm at a time.
To avoid the overhead of atomics make all vm statistics ulong such that
they are 64-bit on 64-bit systems where they can be atomically incremented
and are 32-bit on 32-bit systems which may not be able to atomically
increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This patch introduces new halt polling functionality into the kvm_hv kernel
module. When a vcore is idle it will poll for some period of time before
scheduling itself out.
When all of the runnable vcpus on a vcore have ceded (and thus the vcore is
idle) we schedule ourselves out to allow something else to run. In the
event that we need to wake up very quickly (for example an interrupt
arrives), we are required to wait until we get scheduled again.
Implement halt polling so that when a vcore is idle, and before scheduling
ourselves, we poll for vcpus in the runnable_threads list which have
pending exceptions or which leave the ceded state. If we poll successfully
then we can get back into the guest very quickly without ever scheduling
ourselves, otherwise we schedule ourselves out as before.
There exists generic halt_polling code in virt/kvm_main.c, however on
powerpc the polling conditions are different to the generic case. It would
be nice if we could just implement an arch specific kvm_check_block()
function, but there is still other arch specific things which need to be
done for kvm_hv (for example manipulating vcore states) which means that a
separate implementation is the best option.
Testing of this patch with a TCP round robin test between two guests with
virtio network interfaces has found a decrease in round trip time of ~15us
on average. A performance gain is only seen when going out of and
back into the guest often and quickly, otherwise there is no net benefit
from the polling. The polling interval is adjusted such that when we are
often scheduled out for long periods of time it is reduced, and when we
often poll successfully it is increased. The rate at which the polling
interval increases or decreases, and the maximum polling interval, can
be set through module parameters.
Based on the implementation in the generic kvm module by Wanpeng Li and
Paolo Bonzini, and on direction from Paul Mackerras.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
to array
The struct kvmppc_vcore is a structure used to store various information
about a virtual core for a kvm guest. The runnable_threads element of the
struct provides a list of all of the currently runnable vcpus on the core
(those in the KVMPPC_VCPU_RUNNABLE state). The previous implementation of
this list was a linked_list. The next patch requires that the list be able
to be iterated over without holding the vcore lock.
Reimplement the runnable_threads list in the kvmppc_vcore struct as an
array. Implement function to iterate over valid entries in the array and
update access sites accordingly.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
The next commit will introduce a member to the kvmppc_vcore struct which
references MAX_SMT_THREADS which is defined in kvm_book3s_asm.h, however
this file isn't included in kvm_host.h directly. Thus compiling for
certain platforms such as pmac32_defconfig and ppc64e_defconfig with KVM
fails due to MAX_SMT_THREADS not being defined.
Move the struct kvmppc_vcore definition to kvm_book3s.h which explicitly
includes kvm_book3s_asm.h.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Install the callbacks via the state machine.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160818125731.27256-17-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Install the callbacks via the state machine.
I assume here that the powermac has two CPUs and so only one can go up
or down at a time. The variable smp_core99_host_open is here to ensure
that we do not try to open or close the i2c host twice if something goes
wrong and we invoke the prepare or online callback twice due to
rollback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160818125731.27256-16-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().
# echo 0 > /sys/bus/pci/slots/C7/power
iommu: Removing device 0000:01:00.1 from group 0
iommu: Removing device 0000:01:00.0 from group 0
Unable to handle kernel paging request for data at address 0x00000010
Faulting instruction address: 0xc00000000005d898
cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
sp: c000000fe82178a0
msr: 9000000000009033
dar: 10
dsisr: 40000000
current = 0xc000000fe815ab80
paca = 0xc00000000ff00400 softe: 0 irq_happened: 0x01
pid = 2709, comm = sh
Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
(gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
Tue Sep 6 13:37:29 AEST 2016
enter ? for help
[c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
[c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
[c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
[c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
[c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
[c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
[c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
[c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
[c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
[c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
[c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
[c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
[c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
[c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
[c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
[c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
[c000000fe8217e30] c000000000009524 system_call+0x38/0x108
It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.
Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When using the OPAL ICP backend we incorrectly pass Linux CPU numbers
rather than HW CPU numbers to OPAL.
Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
On ppc64le, builds with CONFIG_KEXEC=n fail with:
arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’:
arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’
if (rc && !kdump_in_progress())
This is because pseries/setup.c includes <linux/kexec.h>, but
kdump_in_progress() is defined in <asm/kexec.h>. This is a problem
because the former only includes the latter if CONFIG_KEXEC_CORE=y.
Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c.
Fixes: d3cbff1b5a90 ("powerpc: Put exception configuration in a common place")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Userspace can begin and suspend a transaction within the signal
handler which means they might enter sys_rt_sigreturn() with the
processor in suspended state.
sys_rt_sigreturn() wants to restore process context (which may have
been in a transaction before signal delivery). To do this it must
restore TM SPRS. To achieve this, any transaction initiated within the
signal frame must be discarded in order to be able to restore TM SPRs
as TM SPRs can only be manipulated non-transactionally..
>From the PowerPC ISA:
TM Bad Thing Exception [Category: Transactional Memory]
An attempt is made to execute a mtspr targeting a TM register in
other than Non-transactional state.
Not doing so results in a TM Bad Thing:
[12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
[12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
[12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
[12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
[12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
scsi_transport_sas bnx2x ipr mdio libcrc32c
[12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
[12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
[12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
[12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0)
[12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000
[12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
[12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
[12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
[12045.223630] Call Trace:
[12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
[12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
[12045.223806] Instruction dump:
[12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
[12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
[12045.224074] ---[ end trace cb8002ee240bae76 ]---
It isn't clear exactly if there is really a use case for userspace
returning with a suspended transaction, however, doing so doesn't (on
its own) constitute a bad frame. As such, this patch simply discards
the transactional state of the context calling the sigreturn and
continues.
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In a situation, where Linux kernel gets notified about duplicate error log
from OPAL, it is been observed that kernel fails to remove sysfs entries
(/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
we currently search the error log/dump kobject in the kset list via
'kset_find_obj()' routine. Which eventually increment the reference count
by one, once it founds the kobject.
So, unless we decrement the reference count by one after it found the kobject,
we would not be able to release the kobject properly later.
This patch adds the 'kobject_put()' which was missing earlier.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
tabort_syscall runs with RI=1, so a nested recoverable machine
check will load the paca into r13 and overwrite what we loaded
it with, because exceptions returning to privileged mode do not
restore r13.
Fixes: b4b56f9ecab4 (powerpc/tm: Abort syscalls in active transactions)
Cc: stable@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
As discussed recently on the kvm mailing list, David Gibson's
intention in commit 178a78750212 ("vfio: Enable VFIO device for
powerpc", 2016-02-01) was to have the KVM VFIO device built in
on all powerpc platforms. This patch adds the "select KVM_VFIO"
statement that makes this happen.
Currently, arch/powerpc/kvm/Makefile doesn't include vfio.o for
the 64-bit kvm module, because the list of objects doesn't use
the $(common-objs-y) list. The reason it doesn't is because we
don't necessarily want coalesced_mmio.o or emulate.o (for example
if HV KVM is the only target), and common-objs-y includes both.
Since this is confusing, this patch adjusts the definitions so that
we now use $(common-objs-y) in the list for the 64-bit kvm.ko
module, emulate.o is removed from common-objs-y and added in the
places that need it, and the inclusion of coalesced_mmio.o now
depends on CONFIG_KVM_MMIO.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack. Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.
Note that an array of 50 ftrace_ret_stack structs is allocated for every
task. So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use. So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data. Therefore ->save_regs() cannot use
gpiochip_get_data()
[ 0.275940] Unable to handle kernel paging request for data at address 0x00000130
[ 0.283120] Faulting instruction address: 0xc01b44cc
[ 0.288175] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.293343] PREEMPT CMPC885
[ 0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty #68
[ 0.304131] task: c6074000 ti: c6080000 task.ti: c6080000
[ 0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708
[ 0.314372] REGS: c6081d90 TRAP: 0300 Not tainted (4.7.0-g65124df-dirty)
[ 0.322267] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24000028 XER: 20000000
[ 0.328813] DAR: 00000130 DSISR: c0000000
GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 00000000
GPR08: c601d028 00000000 ffffffff 00000001 24000044 00000000 c0002790 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c05643b0 00000083
GPR24: c04a1a6c c0560000 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000
[ 0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc
[ 0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44
[ 0.370972] Call Trace:
[ 0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc
[ 0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118
[ 0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c
[ 0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc
[ 0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110
[ 0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64
[ 0.408233] Instruction dump:
[ 0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 712a0004
[ 0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 7c0802a6 9421ffe0
[ 0.426763] ---[ end trace fe4113ee21d72ffa ]---
fixes: e65078f1f3490 ("powerpc: sysdev: cpm1: use gpiochip data pointer")
fixes: a14a2d484b386 ("powerpc: cpm_common: use gpiochip data pointer")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
MCE must not enable MSR_RI until PACA_EXMC is no longer being used.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
MCE must not use PACA_EXGEN. When a general exception enables MSR_RI,
that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the
PACA save area is still in use.
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
client-architecture-support
When booting from an OpenFirmware which supports it, we use the
"ibm,client-architecture-support" firmware call to communicate
our capabilities to firmware.
The format of the structure we pass to firmware is specified in
PAPR (Power Architecture Platform Requirements), or the public version
LoPAPR (Linux on Power Architecture Platform Reference).
Referring to table 244 in LoPAPR v1.1, option vector 5 contains a 4 byte
field at bytes 17-20 for the "Platform Facilities Enable". This is
followed by a 1 byte field at byte 21 for "Sub-Processor Represenation
Level".
Comparing to the code, there we have the Platform Facilities
options (OV5_PFO_*) at byte 17, but we fail to pad that field out to its
full width of 4 bytes. This means the OV5_SUB_PROCESSORS option is
incorrectly placed at byte 18.
Fix it by adding zero bytes for bytes 18, 19, 20, and comment the bytes
to hopefully make it clearer in future.
As far as I'm aware nothing actually consumes this value at this time,
so the effect of this bug is nil in practice.
It does mean we've been incorrectly setting bit 15 of the "Platform
Facilities Enable" option for the past ~3 1/2 years, so we should avoid
allocating that bit to anything else in future.
Fixes: df77c7992029 ("powerpc/pseries: Update ibm,architecture.vec for PAPR 2.7/POWER8")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We observed a kernel oops when running a PPC guest with config NR_CPUS=4
and qemu option "-smp cores=1,threads=8":
[ 30.634781] Unable to handle kernel paging request for data at
address 0xc00000014192eb17
[ 30.636173] Faulting instruction address: 0xc00000000003e5cc
[ 30.637069] Oops: Kernel access of bad area, sig: 11 [#1]
[ 30.637877] SMP NR_CPUS=4 NUMA pSeries
[ 30.638471] Modules linked in:
[ 30.638949] CPU: 3 PID: 27 Comm: migration/3 Not tainted
4.7.0-07963-g9714b26 #1
[ 30.640059] task: c00000001e29c600 task.stack: c00000001e2a8000
[ 30.640956] NIP: c00000000003e5cc LR: c00000000003e550 CTR:
0000000000000000
[ 30.642001] REGS: c00000001e2ab8e0 TRAP: 0300 Not tainted
(4.7.0-07963-g9714b26)
[ 30.643139] MSR: 8000000102803033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22004084 XER: 00000000
[ 30.644583] CFAR: c000000000009e98 DAR: c00000014192eb17 DSISR: 40000000 SOFTE: 0
GPR00: c00000000140a6b8 c00000001e2abb60 c0000000016dd300 0000000000000003
GPR04: 0000000000000000 0000000000000004 c0000000016e5920 0000000000000008
GPR08: 0000000000000004 c00000014192eb17 0000000000000000 0000000000000020
GPR12: c00000000140a6c0 c00000000ffffc00 c0000000000d3ea8 c00000001e005680
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 c00000001e6b3a00 0000000000000000 0000000000000001
GPR24: c00000001ff85138 c00000001ff85130 000000001eb6f000 0000000000000001
GPR28: 0000000000000000 c0000000017014e0 0000000000000000 0000000000000018
[ 30.653882] NIP [c00000000003e5cc] __cpu_disable+0xcc/0x190
[ 30.654713] LR [c00000000003e550] __cpu_disable+0x50/0x190
[ 30.655528] Call Trace:
[ 30.655893] [c00000001e2abb60] [c00000000003e550] __cpu_disable+0x50/0x190 (unreliable)
[ 30.657280] [c00000001e2abbb0] [c0000000000aca0c] take_cpu_down+0x5c/0x100
[ 30.658365] [c00000001e2abc10] [c000000000163918] multi_cpu_stop+0x1a8/0x1e0
[ 30.659617] [c00000001e2abc60] [c000000000163cc0] cpu_stopper_thread+0xf0/0x1d0
[ 30.660737] [c00000001e2abd20] [c0000000000d8d70] smpboot_thread_fn+0x290/0x2a0
[ 30.661879] [c00000001e2abd80] [c0000000000d3fa8] kthread+0x108/0x130
[ 30.662876] [c00000001e2abe30] [c000000000009968] ret_from_kernel_thread+0x5c/0x74
[ 30.664017] Instruction dump:
[ 30.664477] 7bde1f24 38a00000 787f1f24 3b600001 39890008 7d204b78 7d05e214 7d0b07b4
[ 30.665642] 796b1f24 7d26582a 7d204a14 7d29f214 <7d4048a8> 7d4a3878 7d4049ad 40c2fff4
[ 30.666854] ---[ end trace 32643b7195717741 ]---
The reason of this is that in __cpu_disable(), when we try to set the
cpu_sibling_mask or cpu_core_mask of the sibling CPUs of the disabled
one, we don't check whether the current configuration employs those
sibling CPUs(hw threads). And if a CPU is not employed by a
configuration, the percpu structures cpu_{sibling,core}_mask are not
allocated, therefore accessing those cpumasks will result in problems as
above.
This patch fixes this problem by adding an addition check on whether the
id is no less than nr_cpu_ids in the sibling CPU iteration code.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Unsigned type is always non-negative, so the loop could not end in case
condition is never true.
The problem has been detected using semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).
The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped). Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).
It introduces the 'pcibios_free_controller_deferred()' .release_fn()
and it expects .release_data to hold a pointer to the pci_controller.
The function implictly calls 'pcibios_free_controller()', so an user
must *NOT* explicitly call it if using the new _deferred() callback.
The functionality is enabled for pseries (although it isn't platform
specific, and may be used by cxl).
Details on not-so-elegant design choices:
- Use 'pci_host_bridge.release_data' field as pointer to associated
'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
in pcibios_free_controller_deferred().
That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
(so, if the last reference is released after pci_remove_root_bus()
runs, which eventually reaches pcibios_free_controller_deferred(),
that would hit a null pointer dereference).
The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
are interested in this fix.
Test-case #1 (hold references)
# ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
<...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>
# ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
<...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>
# cat >/dev/sdaa & pid1=$!
# cat >/dev/sdab & pid2=$!
# drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
Validating PHB DLPAR capability...yes.
[ 594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
[ 594.306738] pci_hp_remove_devices: Removing 0021:01:00.0...
...
[ 598.236381] pci_hp_remove_devices: Removing 0021:01:00.1...
...
[ 611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
[ 611.972140] rpadlpar_io: slot PHB 33 removed
# kill -9 $pid1
# kill -9 $pid2
[ 632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1
Test-case #2 (don't hold references)
# drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
Validating PHB DLPAR capability...yes.
[ 916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
[ 916.357386] pci_hp_remove_devices: Removing 0021:01:00.0...
...
[ 920.566527] pci_hp_remove_devices: Removing 0021:01:00.1...
...
[ 933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
[ 933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
[ 933.955999] rpadlpar_io: slot PHB 33 removed
Suggested-By: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The field "owner" is set by the core.
Thus delete an unneeded initialisation.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The field "owner" is set by the core.
Thus delete an unneeded initialisation.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|