summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2020-01-23powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specificOliver O'Halloran
The powerpc PCI code requires that a pci_dn structure exists for all devices in the system. This is fine for real devices since at boot a pci_dn is created for each PCI device in the DT and it's fine for hotplugged devices since the hotplug slot driver will manage the pci_dn's devices in hotplug slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for virtual functions since firmware is unaware of VFs, and they aren't "hot plugged" in the traditional sense. Management of the pci_dn is handled by the, poorly named, functions: add_pci_dev_data() and remove_pci_dev_data(). The entire body of these functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used in any other context, so make them only available when CONFIG_PCI_IOV is selected, and rename them to reflect their actual usage rather than having them masquerade as generic code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-2-oohall@gmail.com
2020-01-23powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOVOliver O'Halloran
When disabling virtual functions on an SR-IOV adapter we currently do not correctly remove the EEH state for the now-dead virtual functions. When removing the pci_dn that was created for the VF when SR-IOV was enabled we free the corresponding eeh_dev without removing it from the child device list of the eeh_pe that contained it. This can result in crashes due to the use-after-free. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-1-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Make clearing EEH_DEV_SYSFS sanerOliver O'Halloran
The eeh_sysfs_remove_device() function is supposed to clear the EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added for a pci_dev. When the sysfs files are removed eeh_remove_device() the eeh_dev and the pci_dev have already been de-associated. This then causes the pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so the flag can't be cleared from the still-live eeh_dev. This problem is worked around in the caller by clearing the flag manually. However, this behaviour doesn't make a whole lot of sense, so this patch fixes it by: a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is called before de-associating the pci_dev and eeh_dev. b) Making eeh_sysfs_remove_device() emit a warning if there's no corresponding eeh_dev for a pci_dev. The paths where the sysfs files are only reachable if EEH was setup for the device for the device in the first place so hitting this warning indicates a programming error. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-6-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Remove double pci_dn lookup.Oliver O'Halloran
In eeh_notify_resume_show() the pci_dn for the device is looked up once in the declaration block and then once after checking for a NULL eeh_dev. Remove the second lookup since it's pointless. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-5-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: ifdef pseries sr-iov sysfs propertiesOliver O'Halloran
There are several EEH sysfs properties that only exists when the "ibm,is-open-sriov-pf" property appears in the device tree node of the PCI device. This used on pseries to indicate to the guest that the hypervisor allows the guest to configure the SR-IOV capability. Doing this requires some handshaking between the guest, hypervisor and userspace when a VF is EEH frozen which is why these properties exist. This is all dead code on non-pseries platforms so wrap it in an #ifdef CONFIG_PPC_PSERIES to make the dependency clearer. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-4-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Fix incorrect commentOliver O'Halloran
The EEH_ATTR_SHOW() helper is used to display fields from struct eeh_dev not struct pci_dn. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-3-oohall@gmail.com
2020-01-23powerpc/eeh_cache: Don't use pci_dn when inserting new rangesOliver O'Halloran
At the point where we start inserting ranges into the EEH address cache the binding between pci_dev and eeh_dev has already been set up. Instead of consulting the pci_dn tree we can retrieve the eeh_dev directly using pci_dev_to_eeh_dev(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-2-oohall@gmail.com
2020-01-23powerpc/powernv/ioda: Find opencapi slot for a device nodeFrederic Barrat
Unlike real PCI slots, opencapi slots are directly associated to the (virtual) opencapi PHB, there's no intermediate bridge. So when looking for a slot ID, we must start the search from the device node itself and not its parent. Also, the slot ID is not attached to a specific bdfn, so let's build it from the PHB ID, like skiboot. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-6-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: Release opencapi deviceFrederic Barrat
With hotplug, an opencapi device can now go away. It needs to be released, mostly to clean up its PE state. We were previously not defining any device callback. We can reuse the standard PCI release callback, it does a bit too much for an opencapi device, but it's harmless, and only needs minor tuning. Also separate the undo of the PELT-V code in a separate function, it is not needed for NPU devices and it improves a bit the readability of the code. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-5-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: set up PE on opencapi device when enablingFrederic Barrat
The PE for an opencapi device was set as part of a late PHB fixup operation, when creating the PHB. To use the PCI hotplug framework, this is not going to work, as the PHB stays the same, it's only the devices underneath which are updated. For regular PCI devices, it is done as part of the reconfiguration of the bridge, but for opencapi PHBs, we don't have an intermediate bridge. So let's define the PE when the device is enabled. PEs are meaningless for opencapi, the NPU doesn't define them and opal is not doing anything with them. Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-4-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: Protect PE listFrederic Barrat
Protect the PHB's list of PE. Probably not needed as long as it was populated during PHB creation, but it feels right and will become required once we can add/remove opencapi devices on hotplug. Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-3-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: Fix ref count for devices with their own PEFrederic Barrat
The pci_dn structure used to store a pointer to the struct pci_dev, so taking a reference on the device was required. However, the pci_dev pointer was later removed from the pci_dn structure, but the reference was kept for the npu device. See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn"). We don't need to take a reference on the device when assigning the PE as the struct pnv_ioda_pe is cleaned up at the same time as the (physical) device is released. Doing so prevents the device from being released, which is a problem for opencapi devices, since we want to be able to remove them through PCI hotplug. Now the ugly part: nvlink npu devices are not meant to be released. Because of the above, we've always leaked a reference and simply removing it now is dangerous and would likely require more work. There's currently no release device callback for nvlink devices for example. So to be safe, this patch leaks a reference on the npu device, but only for nvlink and not opencapi. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
2020-01-23powerpc/vdso32: miscellaneous optimisationsChristophe Leroy
Various optimisations by inverting branches and removing redundant instructions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b4e79f963845545bcce1459cd6fcfe46bdde7863.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: implement clock_getres entirelyChristophe Leroy
clock_getres returns hrtimer_res for all clocks but coarse ones for which it returns KTIME_LOW_RES. return EINVAL for unknown clocks. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/37f94e47c91070b7606fb3ec3fe6fd2302a475a0.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: use LOAD_REG_IMMEDIATE()Christophe Leroy
Use LOAD_REG_IMMEDIATE() to load registers with immediate value. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/36f111437e66e601929308f5d5dce230e1ce472f.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: Don't read cache line size from the datapage on PPC32.Christophe Leroy
On PPC32, the cache lines have a fixed size known at build time. Don't read it from the datapage. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dfa7b35e27e01964fcda84bf1ed8b2b31cf93826.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: inline __get_datapage()Christophe Leroy
__get_datapage() is only a few instructions to retrieve the address of the page where the kernel stores data to the VDSO. By inlining this function into its users, a bl/blr pair and a mflr/mtlr pair is avoided, plus a few reg moves. The improvement is noticeable (about 55 nsec/call on an 8xx) vdsotest before the patch: gettimeofday: vdso: 731 nsec/call clock-gettime-realtime-coarse: vdso: 668 nsec/call clock-gettime-monotonic-coarse: vdso: 745 nsec/call vdsotest after the patch: gettimeofday: vdso: 677 nsec/call clock-gettime-realtime-coarse: vdso: 613 nsec/call clock-gettime-monotonic-coarse: vdso: 690 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c39ef7f3dfa25356b01e211d539671f279086c09.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSEChristophe Leroy
This is copied and adapted from commit 5c929885f1bb ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE") from Santosh Sivaraj <santosh@fossix.org> Benchmark from vdsotest-all: clock-gettime-realtime: syscall: 3601 nsec/call clock-gettime-realtime: libc: 1072 nsec/call clock-gettime-realtime: vdso: 931 nsec/call clock-gettime-monotonic: syscall: 4034 nsec/call clock-gettime-monotonic: libc: 1213 nsec/call clock-gettime-monotonic: vdso: 1076 nsec/call clock-gettime-realtime-coarse: syscall: 2722 nsec/call clock-gettime-realtime-coarse: libc: 805 nsec/call clock-gettime-realtime-coarse: vdso: 668 nsec/call clock-gettime-monotonic-coarse: syscall: 2949 nsec/call clock-gettime-monotonic-coarse: libc: 882 nsec/call clock-gettime-monotonic-coarse: vdso: 745 nsec/call Additional test passed with: vdsotest -d 30 clock-gettime-monotonic-coarse verify Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/linuxppc/issues/issues/41 Link: https://lore.kernel.org/r/d1d24a376e396540194eeb85a2efe481e92ade24.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/32: Add VDSO version of getcpu on non SMPChristophe Leroy
Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added getcpu() for PPC64 only, by making use of a user readable general purpose SPR. PPC32 doesn't have any such SPR. For non SMP, just return CPU id 0 from the VDSO directly. PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0. Before the patch, vdsotest reported: getcpu: syscall: 1572 nsec/call getcpu: libc: 1787 nsec/call getcpu: vdso: not tested Now, vdsotest reports: getcpu: syscall: 1582 nsec/call getcpu: libc: 502 nsec/call getcpu: vdso: 187 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eaac4b6494ecff1811220fccc895bf282aab884a.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/devicetrees: Change 'gpios' to 'cs-gpios' on fsl, spi nodesChristophe Leroy
Since commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO descriptors"), the prefered way to define chipselect GPIOs is using 'cs-gpios' property instead of the legacy 'gpios' property. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7556683b57d8ce100855857f03d1cd3d2903d045.1574943062.git.christophe.leroy@c-s.fr
2020-01-23powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size.Christophe Leroy
Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but it has a breakpoint support based on a set of comparators which allow more flexibility. Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint") implemented breakpoints by emulating the DABR behaviour. It did this by setting one comparator the match 4 bytes at breakpoint address and the other comparator to match 4 bytes at breakpoint address + 4. Rewrite 8xx hw_breakpoint to make breakpoints match all addresses defined by the breakpoint address and length by making full use of comparators. Now, comparator E is set to match any address greater than breakpoint address minus one. Comparator F is set to match any address lower than breakpoint address plus breakpoint length. Addresses are aligned to 32 bits. When the breakpoint range starts at address 0, the breakpoint is set to match comparator F only. When the breakpoint range end at address 0xffffffff, the breakpoint is set to match comparator E only. Otherwise the breakpoint is set to match comparator E and F. At the same time, use registers bit names instead of hardcoded values. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05105deeaf63bc02151aea2cdeaf525534e0e9d4.1574790198.git.christophe.leroy@c-s.fr
2020-01-23powerpc/8xx: Fix permanently mapped IMMR region.Christophe Leroy
When not using large TLBs, the IMMR region is still mapped as a whole block in the FIXMAP area. Properly report that the IMMR region is block-mapped even when not using large TLBs. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/45f4f414bcd7198b0755cf4287ff216fbfc24b9d.1574774187.git.christophe.leroy@c-s.fr
2020-01-23powerpc/ptdump: Only enable PPC_CHECK_WX with STRICT_KERNEL_RWXChristophe Leroy
ptdump_check_wx() is called from mark_rodata_ro() which only exists when CONFIG_STRICT_KERNEL_RWX is selected. Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/922d4939c735c6b52b4137838bcc066fffd4fc33.1578989545.git.christophe.leroy@c-s.fr
2020-01-23powerpc/ptdump: Fix W+X verificationChristophe Leroy
Verification cannot rely on simple bit checking because on some platforms PAGE_RW is 0, checking that a page is not W means checking that PAGE_RO is set instead of checking that PAGE_RW is not set. Use pte helpers instead of checking bits. Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0d894839fdbb19070f0e1e4140363be4f2bb62fc.1578989540.git.christophe.leroy@c-s.fr
2020-01-23powerpc/ptdump: Fix W+X verification call in mark_rodata_ro()Christophe Leroy
ptdump_check_wx() also have to be called when pages are mapped by blocks. Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/37517da8310f4457f28921a4edb88fb21d27b62a.1578989531.git.christophe.leroy@c-s.fr
2020-01-23powerpc/ptdump: don't entirely rebuild kernel when selecting CONFIG_PPC_DEBUG_WXChristophe Leroy
Selecting CONFIG_PPC_DEBUG_WX only impacts ptdump and pgtable_32/64 init calls. Declaring related functions in asm/pgtable.h implies rebuilding almost everything. Move ptdump_check_wx() declaration in mm/mmu_decl.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bf34fd9dca61eadf9a134a9f89ebbc162cfd5f86.1578986011.git.christophe.leroy@c-s.fr
2020-01-23powerpc/mm/hash: Fix sharing context ids between kernel & userspaceAneesh Kumar K.V
Commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") has a bug in the definition of MIN_USER_CONTEXT. The result is that the context id used for the vmemmap and the lowest context id handed out to userspace are the same. The context id is essentially the process identifier as far as the first stage of the MMU translation is concerned. This can result in multiple SLB entries with the same VSID (Virtual Segment ID), accessible to the kernel and some random userspace process that happens to get the overlapping id, which is not expected eg: 07 c00c000008000000 40066bdea7000500 1T ESID= c00c00 VSID= 66bdea7 LLP:100 12 0002000008000000 40066bdea7000d80 1T ESID= 200 VSID= 66bdea7 LLP:100 Even though the user process and the kernel use the same VSID, the permissions in the hash page table prevent the user process from reading or writing to any kernel mappings. It can also lead to SLB entries with different base page size encodings (LLP), eg: 05 c00c000008000000 00006bde0053b500 256M ESID=c00c00000 VSID= 6bde0053b LLP:100 09 0000000008000000 00006bde0053bc80 256M ESID= 0 VSID= 6bde0053b LLP: 0 Such SLB entries can result in machine checks, eg. as seen on a G5: Oops: Machine check, sig: 7 [#1] BE PAGE SIZE=64K MU-Hash SMP NR_CPUS=4 NUMA Power Mac NIP: c00000000026f248 LR: c000000000295e58 CTR: 0000000000000000 REGS: c0000000erfd3d70 TRAP: 0200 Tainted: G M (5.5.0-rcl-gcc-8.2.0-00010-g228b667d8ea1) MSR: 9000000000109032 <SF,HV,EE,ME,IR,DR,RI> CR: 24282048 XER: 00000000 DAR: c00c000000612c80 DSISR: 00000400 IRQMASK: 0 ... NIP [c00000000026f248] .kmem_cache_free+0x58/0x140 LR [c088000008295e58] .putname 8x88/0xa Call Trace: .putname+0xB8/0xa .filename_lookup.part.76+0xbe/0x160 .do_faccessat+0xe0/0x380 system_call+0x5c/ex68 This happens with 256MB segments and 64K pages, as the duplicate VSID is hit with the first vmemmap segment and the first user segment, and older 32-bit userspace maps things in the first user segment. On other CPUs a machine check is not seen. Instead the userspace process can get stuck continuously faulting, with the fault never properly serviced, due to the kernel not understanding that there is already a HPTE for the address but with inaccessible permissions. On machines with 1T segments we've not seen the bug hit other than by deliberately exercising it. That seems to be just a matter of luck though, due to the typical layout of the user virtual address space and the ranges of vmemmap that are typically populated. To fix it we add 2 to MIN_USER_CONTEXT. This ensures the lowest context given to userspace doesn't overlap with the VMEMMAP context, or with the context for INVALID_REGION_ID. Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Christian Marillat <marillat@debian.org> Reported-by: Romain Dolbeau <romain@dolbeau.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Account for INVALID_REGION_ID, mostly rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200123102547.11623-1-mpe@ellerman.id.au
2020-01-22powerpc/xive: Discard ESB load value when interrupt is invalidFrederic Barrat
A load on an ESB page returning all 1's means that the underlying device has invalidated the access to the PQ state of the interrupt through mmio. It may happen, for example when querying a PHB interrupt while the PHB is in an error state. In that case, we should consider the interrupt to be invalid when checking its state in the irq_get_irqchip_state() handler. Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> [clg: wrote a commit log, introduced XIVE_ESB_INVALID ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org
2020-01-22powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UVBharata B Rao
Let PPC_UV depend only on DEVICE_PRIVATE which in turn will satisfy all the other required dependencies Fixes: 013a53f2d25a ("powerpc: Ultravisor: Add PPC_UV config option") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200109092047.24043-1-bharata@linux.ibm.com
2020-01-18open: introduce openat2(2) syscallAleksa Sarai
/* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). It also only contains u64s (even though ->mode arguably should be a u16) to avoid having padding fields which are never used in the future. Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). After input from Florian Weimer, the new open_how and flag definitions are inside a separate header from uapi/linux/fcntl.h, to avoid problems that glibc has with importing that header. /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. Another possible avenue of future work would be some kind of CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace which openat2(2) flags and fields are supported by the current kernel (to avoid userspace having to go through several guesses to figure it out). [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ [6]: https://youtu.be/ggD-eb3yPVs Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-17KVM: PPC: Book3S HV: Implement H_SVM_INIT_ABORT hcallSukadev Bhattiprolu
Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to abort an SVM after it has issued the H_SVM_INIT_START and before the H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor encounters security violations or other errors when starting an SVM. Note that this hcall is different from UV_SVM_TERMINATE ucall which is used by HV to terminate/cleanup an VM that has becore secure. The H_SVM_INIT_ABORT basically undoes operations that were done since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back to normal memory, and terminate the SVM. (If we do not bring the pages back to normal memory, the text/data of the VM would be stuck in secure memory and since the SVM did not go secure, its MSR_S bit will be clear and the VM wont be able to access its pages even to do a clean exit). Based on patches and discussion with Paul Mackerras, Ram Pai and Bharata Rao. Signed-off-by: Ram Pai <linuxram@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-01-17KVM: PPC: Add skip_page_out parameter to uvmem functionsSukadev Bhattiprolu
Add 'skip_page_out' parameter to kvmppc_uvmem_drop_pages() so the callers can specify whetheter or not to skip paging out pages. This will be needed in a follow-on patch that implements H_SVM_INIT_ABORT hcall. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-01-17KVM: PPC: Book3E: Replace current->mm by kvm->mmLeonardo Bras
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-01-17KVM: PPC: Book3S: Replace current->mm by kvm->mmLeonardo Bras
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-01-17KVM: PPC: Remove set but not used variable 'ra', 'rs', 'rt'zhengbin
Fixes gcc '-Wunused-but-set-variable' warning: arch/powerpc/kvm/emulate_loadstore.c: In function kvmppc_emulate_loadstore: arch/powerpc/kvm/emulate_loadstore.c:87:6: warning: variable ra set but not used [-Wunused-but-set-variable] arch/powerpc/kvm/emulate_loadstore.c: In function kvmppc_emulate_loadstore: arch/powerpc/kvm/emulate_loadstore.c:87:10: warning: variable rs set but not used [-Wunused-but-set-variable] arch/powerpc/kvm/emulate_loadstore.c: In function kvmppc_emulate_loadstore: arch/powerpc/kvm/emulate_loadstore.c:87:14: warning: variable rt set but not used [-Wunused-but-set-variable] They are not used since commit 2b33cb585f94 ("KVM: PPC: Reimplement LOAD_FP/STORE_FP instruction mmio emulation with analyse_instr() input") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-01-16Merge tag 'soc-fsl-next-v5.6' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into arm/drivers NXP/FSL SoC driver updates for v5.6 QUICC Engine drivers - Improve the QE drivers to be compatible with ARM/ARM64/PPC64 architectures - Various cleanups to the QE drivers * tag 'soc-fsl-next-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux: (49 commits) soc: fsl: qe: remove set but not used variable 'mm_gc' soc: fsl: qe: remove PPC32 dependency from CONFIG_QUICC_ENGINE soc: fsl: qe: remove unused #include of asm/irq.h from ucc.c net: ethernet: freescale: make UCC_GETH explicitly depend on PPC32 net/wan/fsl_ucc_hdlc: reject muram offsets above 64K net/wan/fsl_ucc_hdlc: fix reading of __be16 registers net/wan/fsl_ucc_hdlc: avoid use of IS_ERR_VALUE() soc: fsl: qe: avoid IS_ERR_VALUE in ucc_fast.c soc: fsl: qe: drop pointless check in qe_sdma_init() soc: fsl: qe: drop use of IS_ERR_VALUE in qe_sdma_init() soc: fsl: qe: avoid IS_ERR_VALUE in ucc_slow.c soc: fsl: qe: refactor cpm_muram_alloc_common to prevent BUG on error path soc: fsl: qe: drop broken lazy call of cpm_muram_init() soc: fsl: qe: make cpm_muram_free() ignore a negative offset soc: fsl: qe: make cpm_muram_free() return void soc: fsl: qe: change return type of cpm_muram_alloc() to s32 serial: ucc_uart: access __be32 field using be32_to_cpu serial: ucc_uart: limit brg-frequency workaround to PPC32 serial: ucc_uart: use of_property_read_u32() in ucc_uart_probe() serial: ucc_uart: stub out soft_uart_init for !CONFIG_PPC32 ... Link: https://lore.kernel.org/r/1578608351-23289-1-git-send-email-leoyang.li@nxp.com Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-16powerpc/64s: Reimplement power4_idle code in CNicholas Piggin
This implements the tricky tracing and soft irq handling bits in C, leaving the low level bit to asm. A functional difference is that this redirects the interrupt exit to a return stub to execute blr, rather than the lr address itself. This is probably barely measurable on real hardware, but it keeps the link stack balanced. Tested with QEMU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move power4_fixup_nap back into exceptions-64s.S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190711022404.18132-1-npiggin@gmail.com
2020-01-16powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLERussell Currey
I have tested this with the Radix MMU and everything seems to work, and the previous patch for Hash seems to fix everything too. STRICT_KERNEL_RWX should still be disabled by default for now. Please test STRICT_KERNEL_RWX + RELOCATABLE! Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191224064126.183670-2-ruscur@russell.cc
2020-01-16powerpc/book3s64/hash: Disable 16M linear mapping size if not alignedRussell Currey
With STRICT_KERNEL_RWX on in a relocatable kernel under the hash MMU, if the position the kernel is loaded at is not 16M aligned things go horribly wrong. Specifically hash__mark_initmem_nx() will call hash__change_memory_range() which then aligns down the start address, and due to the text not being 16M aligned causes some of the kernel text to be marked non-executable. We can avoid this when selecting the linear mapping size, so do so and print a warning. I tested this for various alignments and as long as the position is 64K aligned it's fine (the base requirement for powerpc). Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Add details of the failure mode] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191224064126.183670-1-ruscur@russell.cc
2020-01-14arch/powerpc/setup: Drop dummy_con initializationArvind Sankar
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset. Drop it from arch setup code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20191218214506.49252-18-nivedita@alum.mit.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14powerpc/pseries: Advance pfn if section is not present in lmb_is_removable()Pingfan Liu
In lmb_is_removable(), if a section is not present, it should continue to test the rest of the sections in the block. But the current code fails to do so. Fixes: 51925fb3c5c9 ("powerpc/pseries: Implement memory hotplug remove in the kernel") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1578632042-12415-1-git-send-email-kernelfans@gmail.com
2020-01-14powerpc/xmon: don't access ASDR in VMsSukadev Bhattiprolu
ASDR is HV-privileged and must only be accessed in HV-mode. Fixes a Program Check (0x700) when xmon in a VM dumps SPRs. Fixes: d1e1b351f50f ("powerpc/xmon: Add ISA v3.0 SPRs to SPR dump") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200107021633.GB29843@us.ibm.com
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13Merge 5.5-rc6 into tty-nextGreg Kroah-Hartman
We need the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-10Merge branch 'x86/mm' into efi/core, to pick up dependenciesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-09crypto: remove CRYPTO_TFM_RES_BAD_KEY_LENEric Biggers
The CRYPTO_TFM_RES_BAD_KEY_LEN flag was apparently meant as a way to make the ->setkey() functions provide more information about errors. However, no one actually checks for this flag, which makes it pointless. Also, many algorithms fail to set this flag when given a bad length key. Reviewing just the generic implementations, this is the case for aes-fixed-time, cbcmac, echainiv, nhpoly1305, pcrypt, rfc3686, rfc4309, rfc7539, rfc7539esp, salsa20, seqiv, and xcbc. But there are probably many more in arch/*/crypto/ and drivers/crypto/. Some algorithms can even set this flag when the key is the correct length. For example, authenc and authencesn set it when the key payload is malformed in any way (not just a bad length), the atmel-sha and ccree drivers can set it if a memory allocation fails, and the chelsio driver sets it for bad auth tag lengths, not just bad key lengths. So even if someone actually wanted to start checking this flag (which seems unlikely, since it's been unused for a long time), there would be a lot of work needed to get it working correctly. But it would probably be much better to go back to the drawing board and just define different return values, like -EINVAL if the key is invalid for the algorithm vs. -EKEYREJECTED if the key was rejected by a policy like "no weak keys". That would be much simpler, less error-prone, and easier to test. So just remove this flag. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-07powerpc/mpc85xx: also write addr_h to spin table for 64bit boot entryBai Yingjie
CPU like P4080 has 36bit physical address, its DDR physical start address can be configured above 4G by LAW registers. For such systems in which their physical memory start address was configured higher than 4G, we need also to write addr_h into the spin table of the target secondary CPU, so that addr_h and addr_l together represent a 64bit physical address. Otherwise the secondary core can not get correct entry to start from. Signed-off-by: Bai Yingjie <byj.tea@gmail.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200106042957.26494-2-yingjie_bai@126.com
2020-01-07powerpc32/booke: consistently return phys_addr_t in __pa()Bai Yingjie
When CONFIG_RELOCATABLE=y is set, VIRT_PHYS_OFFSET is a 64bit variable, thus __pa() returns as 64bit value. But when CONFIG_RELOCATABLE=n, __pa() returns 32bit value. When CONFIG_PHYS_64BIT is set, __pa() should consistently return as 64bit value irrelevant to CONFIG_RELOCATABLE. So we'd make __pa() consistently return phys_addr_t, which is 64bit when CONFIG_PHYS_64BIT is set. Signed-off-by: Bai Yingjie <byj.tea@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200106042957.26494-1-yingjie_bai@126.com
2020-01-07powerpc/powernv: use resource_sizeJulia Lawall
Use resource_size rather than a verbose computation on the end and start fields. The semantic patch that makes these changes is as follows: (http://coccinelle.lip6.fr/) <smpl> @@ struct resource ptr; @@ - (ptr.end - ptr.start + 1) + resource_size(&ptr) </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1577900990-8588-11-git-send-email-Julia.Lawall@inria.fr
2020-01-07powerpc/83xx: use resource_sizeJulia Lawall
Use resource_size rather than a verbose computation on the end and start fields. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) <smpl> @@ struct resource ptr; @@ - (ptr.end - ptr.start + 1) + resource_size(&ptr) </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1577900990-8588-6-git-send-email-Julia.Lawall@inria.fr