Age | Commit message (Collapse) | Author |
|
This is only used in pci-ioda.c so move it there and rename it to match.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-6-oohall@gmail.com
|
|
pnv_pci_dma_dev_setup() does nothing but call the phb->dma_dev_setup()
callback, if one exists. That callback is only set for normal PCIe PHBs so
we can remove the layer of indirection and use the ioda version in
the pci_controller_ops.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-5-oohall@gmail.com
|
|
An ioda_pe for each VF is allocated in pnv_pci_sriov_enable() before
the pci_dev for the VF is created. We need to set the pe->pdev pointer
at some point after the pci_dev is created. Currently we do that in:
pcibios_bus_add_device()
pnv_pci_dma_dev_setup() (via phb->ops.dma_dev_setup)
/* fixup is done here */
pnv_pci_ioda_dma_dev_setup() (via pnv_phb->dma_dev_setup)
The fixup needs to be done before setting up DMA for for the VF's PE,
but there's no real reason to delay it until this point. Move the
fixup into pnv_pci_ioda_fixup_iov() so the ordering is:
pcibios_add_device()
pnv_pci_ioda_fixup_iov() (via ppc_md.pcibios_fixup_sriov)
pcibios_bus_add_device()
...
This isn't strictly required, but it's a slightly more logical place
to do the fixup and it simplifies pnv_pci_dma_dev_setup().
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-4-oohall@gmail.com
|
|
The pnv_pci_dma_dev_setup() only does something when:
1) There PHB contains VFs, or
2) The PHB defines a dma_dev_setup() callback in the pnv_phb structure.
Neither is true for NPU PHBs so there's no reason to set the callback.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-3-oohall@gmail.com
|
|
pcibios_bus_add_device() is the only caller of pcibios_setup_device().
Fold them together since there's no real reason to keep them separate.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-2-oohall@gmail.com
|
|
OPAL provides several different kinds of reboot for the kernel to use,
namely forcing a full reboot, platform error reboot and MPIPL. Right now
triggering the alternative resets requires some ad-hoc method such as
triggering a kernel crash and hoping the stars align. It's sometimes handy
to be able to trigger one of these resets directly, so add a way to do
that.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191101085522.3055-2-oohall@gmail.com
|
|
On PowerNV a few different kinds of reboot are supported. We'd like to be
able to exercise these from xmon so allow 'zr' to take an argument, and
pass that to the ppc_md.restart() function.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191101085522.3055-1-oohall@gmail.com
|
|
Long before we had a generic way for firmware to export memory ranges of
interest we added a special case for the skiboot symbol map. The code is
pretty much identical to the generic export so re-use the code.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191101062611.32610-2-oohall@gmail.com
|
|
Originally we only had a handful of exported memory ranges, but we'd to
export the per-core trace buffers. This results in a lot of files in the
exports directory which is a but unfortunate. We can clean things up a bit
by turning subnodes into subdirectories of the exports directory.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191101062611.32610-1-oohall@gmail.com
|
|
Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.
Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.
Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
|
|
Add a debugfs entry to dump the state of the active IODA PEs. The IODA
PE state reflects how the PHB's internal concept of a PE is
configured. This is separate to the EEH PE state and is managed power
the PowerNV PCI backend rather than the EEH core.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Use DEFINE_DEBUGFS_ATTRIBUTE]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190912052945.12589-3-oohall@gmail.com
|
|
Make the dump trigger off any input rather than just '1'. This allows you
to write "echo 1> dump_diag_data" and it'll do what you want rather than
erroring out pointlessly.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190912052945.12589-2-oohall@gmail.com
|
|
Use the pnv_phb structure as the private data pointer for the debugfs
files. This lets us delete some code and an open-coded use of
hose->private_data.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190912052945.12589-1-oohall@gmail.com
|
|
These functions can only be used on a SR-IOV capable physical function and
they're only called in pcibios_sriov_enable / disable. Make them emit a
warning in the future if they're used incorrectly and remove the dead
code that checks if the device is a VF.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190821062655.19735-3-oohall@gmail.com
|
|
The powerpc PCI code requires that a pci_dn structure exists for all
devices in the system. This is fine for real devices since at boot a pci_dn
is created for each PCI device in the DT and it's fine for hotplugged devices
since the hotplug slot driver will manage the pci_dn's devices in hotplug
slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for
virtual functions since firmware is unaware of VFs, and they aren't
"hot plugged" in the traditional sense.
Management of the pci_dn is handled by the, poorly named, functions:
add_pci_dev_data() and remove_pci_dev_data(). The entire body of these
functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used
in any other context, so make them only available when CONFIG_PCI_IOV
is selected, and rename them to reflect their actual usage rather than
having them masquerade as generic code.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190821062655.19735-2-oohall@gmail.com
|
|
When disabling virtual functions on an SR-IOV adapter we currently do not
correctly remove the EEH state for the now-dead virtual functions. When
removing the pci_dn that was created for the VF when SR-IOV was enabled
we free the corresponding eeh_dev without removing it from the child device
list of the eeh_pe that contained it. This can result in crashes due to the
use-after-free.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190821062655.19735-1-oohall@gmail.com
|
|
The eeh_sysfs_remove_device() function is supposed to clear the
EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added
for a pci_dev.
When the sysfs files are removed eeh_remove_device() the eeh_dev and the
pci_dev have already been de-associated. This then causes the
pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so
the flag can't be cleared from the still-live eeh_dev. This problem is
worked around in the caller by clearing the flag manually. However, this
behaviour doesn't make a whole lot of sense, so this patch fixes it by:
a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is
called before de-associating the pci_dev and eeh_dev.
b) Making eeh_sysfs_remove_device() emit a warning if there's no
corresponding eeh_dev for a pci_dev. The paths where the sysfs
files are only reachable if EEH was setup for the device
for the device in the first place so hitting this warning
indicates a programming error.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-6-oohall@gmail.com
|
|
In eeh_notify_resume_show() the pci_dn for the device is looked up once in
the declaration block and then once after checking for a NULL eeh_dev.
Remove the second lookup since it's pointless.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-5-oohall@gmail.com
|
|
There are several EEH sysfs properties that only exists when the
"ibm,is-open-sriov-pf" property appears in the device tree node of the PCI
device. This used on pseries to indicate to the guest that the hypervisor
allows the guest to configure the SR-IOV capability. Doing this requires
some handshaking between the guest, hypervisor and userspace when a VF is
EEH frozen which is why these properties exist.
This is all dead code on non-pseries platforms so wrap it in an #ifdef
CONFIG_PPC_PSERIES to make the dependency clearer.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-4-oohall@gmail.com
|
|
The EEH_ATTR_SHOW() helper is used to display fields from struct eeh_dev
not struct pci_dn.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-3-oohall@gmail.com
|
|
At the point where we start inserting ranges into the EEH address cache the
binding between pci_dev and eeh_dev has already been set up. Instead of
consulting the pci_dn tree we can retrieve the eeh_dev directly using
pci_dev_to_eeh_dev().
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-2-oohall@gmail.com
|
|
Unlike real PCI slots, opencapi slots are directly associated to
the (virtual) opencapi PHB, there's no intermediate bridge. So when
looking for a slot ID, we must start the search from the device node
itself and not its parent.
Also, the slot ID is not attached to a specific bdfn, so let's build
it from the PHB ID, like skiboot.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-6-fbarrat@linux.ibm.com
|
|
With hotplug, an opencapi device can now go away. It needs to be
released, mostly to clean up its PE state. We were previously not
defining any device callback. We can reuse the standard PCI release
callback, it does a bit too much for an opencapi device, but it's
harmless, and only needs minor tuning.
Also separate the undo of the PELT-V code in a separate function, it
is not needed for NPU devices and it improves a bit the readability of
the code.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-5-fbarrat@linux.ibm.com
|
|
The PE for an opencapi device was set as part of a late PHB fixup
operation, when creating the PHB. To use the PCI hotplug framework,
this is not going to work, as the PHB stays the same, it's only the
devices underneath which are updated. For regular PCI devices, it is
done as part of the reconfiguration of the bridge, but for opencapi
PHBs, we don't have an intermediate bridge. So let's define the PE
when the device is enabled. PEs are meaningless for opencapi, the NPU
doesn't define them and opal is not doing anything with them.
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-4-fbarrat@linux.ibm.com
|
|
Protect the PHB's list of PE. Probably not needed as long as it was
populated during PHB creation, but it feels right and will become
required once we can add/remove opencapi devices on hotplug.
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-3-fbarrat@linux.ibm.com
|
|
The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").
We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.
Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
|
|
Various optimisations by inverting branches and removing
redundant instructions.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b4e79f963845545bcce1459cd6fcfe46bdde7863.1575273217.git.christophe.leroy@c-s.fr
|
|
clock_getres returns hrtimer_res for all clocks but coarse ones
for which it returns KTIME_LOW_RES.
return EINVAL for unknown clocks.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/37f94e47c91070b7606fb3ec3fe6fd2302a475a0.1575273217.git.christophe.leroy@c-s.fr
|
|
Use LOAD_REG_IMMEDIATE() to load registers with immediate value.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/36f111437e66e601929308f5d5dce230e1ce472f.1575273217.git.christophe.leroy@c-s.fr
|
|
On PPC32, the cache lines have a fixed size known at build time.
Don't read it from the datapage.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dfa7b35e27e01964fcda84bf1ed8b2b31cf93826.1575273217.git.christophe.leroy@c-s.fr
|
|
__get_datapage() is only a few instructions to retrieve the
address of the page where the kernel stores data to the VDSO.
By inlining this function into its users, a bl/blr pair and
a mflr/mtlr pair is avoided, plus a few reg moves.
The improvement is noticeable (about 55 nsec/call on an 8xx)
vdsotest before the patch:
gettimeofday: vdso: 731 nsec/call
clock-gettime-realtime-coarse: vdso: 668 nsec/call
clock-gettime-monotonic-coarse: vdso: 745 nsec/call
vdsotest after the patch:
gettimeofday: vdso: 677 nsec/call
clock-gettime-realtime-coarse: vdso: 613 nsec/call
clock-gettime-monotonic-coarse: vdso: 690 nsec/call
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c39ef7f3dfa25356b01e211d539671f279086c09.1575273217.git.christophe.leroy@c-s.fr
|
|
This is copied and adapted from commit 5c929885f1bb ("powerpc/vdso64:
Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
from Santosh Sivaraj <santosh@fossix.org>
Benchmark from vdsotest-all:
clock-gettime-realtime: syscall: 3601 nsec/call
clock-gettime-realtime: libc: 1072 nsec/call
clock-gettime-realtime: vdso: 931 nsec/call
clock-gettime-monotonic: syscall: 4034 nsec/call
clock-gettime-monotonic: libc: 1213 nsec/call
clock-gettime-monotonic: vdso: 1076 nsec/call
clock-gettime-realtime-coarse: syscall: 2722 nsec/call
clock-gettime-realtime-coarse: libc: 805 nsec/call
clock-gettime-realtime-coarse: vdso: 668 nsec/call
clock-gettime-monotonic-coarse: syscall: 2949 nsec/call
clock-gettime-monotonic-coarse: libc: 882 nsec/call
clock-gettime-monotonic-coarse: vdso: 745 nsec/call
Additional test passed with:
vdsotest -d 30 clock-gettime-monotonic-coarse verify
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/linuxppc/issues/issues/41
Link: https://lore.kernel.org/r/d1d24a376e396540194eeb85a2efe481e92ade24.1575273217.git.christophe.leroy@c-s.fr
|
|
Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added
getcpu() for PPC64 only, by making use of a user readable general
purpose SPR.
PPC32 doesn't have any such SPR.
For non SMP, just return CPU id 0 from the VDSO directly.
PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0.
Before the patch, vdsotest reported:
getcpu: syscall: 1572 nsec/call
getcpu: libc: 1787 nsec/call
getcpu: vdso: not tested
Now, vdsotest reports:
getcpu: syscall: 1582 nsec/call
getcpu: libc: 502 nsec/call
getcpu: vdso: 187 nsec/call
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaac4b6494ecff1811220fccc895bf282aab884a.1575273217.git.christophe.leroy@c-s.fr
|
|
Since commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO
descriptors"), the prefered way to define chipselect GPIOs is using
'cs-gpios' property instead of the legacy 'gpios' property.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7556683b57d8ce100855857f03d1cd3d2903d045.1574943062.git.christophe.leroy@c-s.fr
|
|
Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but
it has a breakpoint support based on a set of comparators which
allow more flexibility.
Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint")
implemented breakpoints by emulating the DABR behaviour. It did
this by setting one comparator the match 4 bytes at breakpoint address
and the other comparator to match 4 bytes at breakpoint address + 4.
Rewrite 8xx hw_breakpoint to make breakpoints match all addresses
defined by the breakpoint address and length by making full use of
comparators.
Now, comparator E is set to match any address greater than breakpoint
address minus one. Comparator F is set to match any address lower than
breakpoint address plus breakpoint length. Addresses are aligned
to 32 bits.
When the breakpoint range starts at address 0, the breakpoint is set
to match comparator F only. When the breakpoint range end at address
0xffffffff, the breakpoint is set to match comparator E only.
Otherwise the breakpoint is set to match comparator E and F.
At the same time, use registers bit names instead of hardcoded values.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/05105deeaf63bc02151aea2cdeaf525534e0e9d4.1574790198.git.christophe.leroy@c-s.fr
|
|
When not using large TLBs, the IMMR region is still
mapped as a whole block in the FIXMAP area.
Properly report that the IMMR region is block-mapped even
when not using large TLBs.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/45f4f414bcd7198b0755cf4287ff216fbfc24b9d.1574774187.git.christophe.leroy@c-s.fr
|
|
ptdump_check_wx() is called from mark_rodata_ro() which only exists
when CONFIG_STRICT_KERNEL_RWX is selected.
Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/922d4939c735c6b52b4137838bcc066fffd4fc33.1578989545.git.christophe.leroy@c-s.fr
|
|
Verification cannot rely on simple bit checking because on some
platforms PAGE_RW is 0, checking that a page is not W means
checking that PAGE_RO is set instead of checking that PAGE_RW
is not set.
Use pte helpers instead of checking bits.
Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d894839fdbb19070f0e1e4140363be4f2bb62fc.1578989540.git.christophe.leroy@c-s.fr
|
|
ptdump_check_wx() also have to be called when pages are mapped
by blocks.
Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/37517da8310f4457f28921a4edb88fb21d27b62a.1578989531.git.christophe.leroy@c-s.fr
|
|
Selecting CONFIG_PPC_DEBUG_WX only impacts ptdump and pgtable_32/64
init calls. Declaring related functions in asm/pgtable.h implies
rebuilding almost everything.
Move ptdump_check_wx() declaration in mm/mmu_decl.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bf34fd9dca61eadf9a134a9f89ebbc162cfd5f86.1578986011.git.christophe.leroy@c-s.fr
|
|
Commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in
the same 0xc range") has a bug in the definition of MIN_USER_CONTEXT.
The result is that the context id used for the vmemmap and the lowest
context id handed out to userspace are the same. The context id is
essentially the process identifier as far as the first stage of the
MMU translation is concerned.
This can result in multiple SLB entries with the same VSID (Virtual
Segment ID), accessible to the kernel and some random userspace
process that happens to get the overlapping id, which is not expected
eg:
07 c00c000008000000 40066bdea7000500 1T ESID= c00c00 VSID= 66bdea7 LLP:100
12 0002000008000000 40066bdea7000d80 1T ESID= 200 VSID= 66bdea7 LLP:100
Even though the user process and the kernel use the same VSID, the
permissions in the hash page table prevent the user process from
reading or writing to any kernel mappings.
It can also lead to SLB entries with different base page size
encodings (LLP), eg:
05 c00c000008000000 00006bde0053b500 256M ESID=c00c00000 VSID= 6bde0053b LLP:100
09 0000000008000000 00006bde0053bc80 256M ESID= 0 VSID= 6bde0053b LLP: 0
Such SLB entries can result in machine checks, eg. as seen on a G5:
Oops: Machine check, sig: 7 [#1]
BE PAGE SIZE=64K MU-Hash SMP NR_CPUS=4 NUMA Power Mac
NIP: c00000000026f248 LR: c000000000295e58 CTR: 0000000000000000
REGS: c0000000erfd3d70 TRAP: 0200 Tainted: G M (5.5.0-rcl-gcc-8.2.0-00010-g228b667d8ea1)
MSR: 9000000000109032 <SF,HV,EE,ME,IR,DR,RI> CR: 24282048 XER: 00000000
DAR: c00c000000612c80 DSISR: 00000400 IRQMASK: 0
...
NIP [c00000000026f248] .kmem_cache_free+0x58/0x140
LR [c088000008295e58] .putname 8x88/0xa
Call Trace:
.putname+0xB8/0xa
.filename_lookup.part.76+0xbe/0x160
.do_faccessat+0xe0/0x380
system_call+0x5c/ex68
This happens with 256MB segments and 64K pages, as the duplicate VSID
is hit with the first vmemmap segment and the first user segment, and
older 32-bit userspace maps things in the first user segment.
On other CPUs a machine check is not seen. Instead the userspace
process can get stuck continuously faulting, with the fault never
properly serviced, due to the kernel not understanding that there is
already a HPTE for the address but with inaccessible permissions.
On machines with 1T segments we've not seen the bug hit other than by
deliberately exercising it. That seems to be just a matter of luck
though, due to the typical layout of the user virtual address space
and the ranges of vmemmap that are typically populated.
To fix it we add 2 to MIN_USER_CONTEXT. This ensures the lowest
context given to userspace doesn't overlap with the VMEMMAP context,
or with the context for INVALID_REGION_ID.
Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Christian Marillat <marillat@debian.org>
Reported-by: Romain Dolbeau <romain@dolbeau.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Account for INVALID_REGION_ID, mostly rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200123102547.11623-1-mpe@ellerman.id.au
|
|
Even if it's read-only, it can still be written to by userspace. Let
them know by adding it to KVM_GET_MSR_INDEX_LIST.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The SPTE_MMIO_MASK overlaps with the bits used to track MMIO
generation number. A high enough generation number would overwrite the
SPTE_SPECIAL_MASK region and cause the MMIO SPTE to be misinterpreted.
Likewise, setting bits 52 and 53 would also cause an incorrect generation
number to be read from the PTE, though this was partially mitigated by the
(useless if it weren't for the bug) removal of SPTE_SPECIAL_MASK from
the spte in get_mmio_spte_generation. Drop that removal, and replace
it with a compile-time assertion.
Fixes: 6eeb4ef049e7 ("KVM: x86: assign two bits to track SPTE kinds")
Reported-by: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-01-22
The following pull-request contains BPF updates for your *net-next* tree.
We've added 92 non-merge commits during the last 16 day(s) which contain
a total of 320 files changed, 7532 insertions(+), 1448 deletions(-).
The main changes are:
1) function by function verification and program extensions from Alexei.
2) massive cleanup of selftests/bpf from Toke and Andrii.
3) batched bpf map operations from Brian and Yonghong.
4) tcp congestion control in bpf from Martin.
5) bulking for non-map xdp_redirect form Toke.
6) bpf_send_signal_thread helper from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/soc
Samsung mach/soc changes for v5.6, part 2
1. Switch from legacy to atomic pwm API in rx1950 (s3c24xx),
2. Cleanups of unneeded selects in Kconfig.
* tag 'samsung-soc-5.6-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: s3c64xx: Drop unneeded select of TIMER_OF
ARM: exynos: Drop unneeded select of MIGHT_HAVE_CACHE_L2X0
ARM: s3c24xx: Switch to atomic pwm API in rx1950
Link: https://lore.kernel.org/r/20200122172649.3143-1-krzk@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
Some Loongson-64 processor didn't set MAC2008 bit in fcsr,
but actually all Loongson64 processors are MAC2008 only.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: chenhc@lemote.com
Cc: paul.burton@mips.com
Cc: linux-kernel@vger.kernel.org
|
|
MAC2008 means the processor implemented IEEE754 style Fused MADD
instruction. It was introduced in Release3 but removed in Release5.
The toolchain support of MAC2008 have never landed except for Loongson
processors.
This patch aimed to disabled the MAC2008 if it's optional. For
MAC2008 only processors, we corrected math-emu behavior to align
with actual hardware behavior.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
[paulburton@kernel.org: Fixup MIPSr2-r5 check in cpu_set_fpu_2008.]
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: chenhc@lemote.com
Cc: paul.burton@mips.com
Cc: linux-kernel@vger.kernel.org
|
|
While rv32 technically has 34-bit physical addresses, no current platforms
use it and it's likely to shake out driver bugs.
Let's keep 64-bit phys_addr_t off on 32-bit builds until one shows up,
since other work will be needed to make such a system useful anyway.
PHYS_ADDR_T_64BIT is def_bool 64BIT, so just remove the select.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This patch ports the feature Kernel Address SANitizer (KASAN).
Note: The start address of shadow memory is at the beginning of kernel
space, which is 2^64 - (2^39 / 2) in SV39. The size of the kernel space is
2^38 bytes so the size of shadow memory should be 2^38 / 8. Thus, the
shadow memory would not overlap with the fixmap area.
There are currently two limitations in this port,
1. RV64 only: KASAN need large address space for extra shadow memory
region.
2. KASAN can't debug the modules since the modules are allocated in VMALLOC
area. We mapped the shadow memory, which corresponding to VMALLOC area, to
the kasan_early_shadow_page because we don't have enough physical space for
all the shadow memory corresponding to VMALLOC area.
Signed-off-by: Nick Hu <nickhu@andestech.com>
Reported-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Updates to the Generic Timer architecture allow ID_PFR1.GenTimer to
have values other than 0 or 1 while still preserving backward
compatibility. At the moment, Linux is quite strict in the way it
handles this field at early boot and will not configure arch timer if
it doesn't find the value 1.
Since here use ubfx for arch timer version extraction (hyb-stub build
with -march=armv7-a, so it is safe)
To help backports (even though the code was correct at the time of writing)
Fixes: 8ec58be9f3ff ("ARM: virt: arch_timers: enable access to physical timers")
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|