summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2020-01-27powerpc/32s: Enable CONFIG_VMAP_STACKChristophe Leroy
A few changes to retrieve DAR and DSISR from struct regs instead of retrieving them directly, as they may have changed due to a TLB miss. Also modifies hash_page() and friends to work with virtual data addresses instead of physical ones. Same on load_up_fpu() and load_up_altivec(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Fix tovirt_vmstack call in head_32.S to fix CHRP build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2e2509a242fd5f3e23df4a06530c18060c4d321e.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/32s: Avoid crossing page boundary while changing SRR0/1.Christophe Leroy
Trying VMAP_STACK with KVM, vmlinux was not starting. This was due to SRR0 and SRR1 clobbered by an ISI due to the rfi being in a different page than the mtsrr0/1: c0003fe0 <mmu_off>: c0003fe0: 38 83 00 54 addi r4,r3,84 c0003fe4: 7c 60 00 a6 mfmsr r3 c0003fe8: 70 60 00 30 andi. r0,r3,48 c0003fec: 4d 82 00 20 beqlr c0003ff0: 7c 63 00 78 andc r3,r3,r0 c0003ff4: 7c 9a 03 a6 mtsrr0 r4 c0003ff8: 7c 7b 03 a6 mtsrr1 r3 c0003ffc: 7c 00 04 ac hwsync c0004000: 4c 00 00 64 rfi Align the 4 instruction block used to deactivate MMU to order 4, so that the block never crosses a page boundary. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/30d2cda111b7977227fff067fa7e358440e2b3a4.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/32s: Reorganise DSI handler.Christophe Leroy
The part decidated to handling hash_page() is fully unneeded for processors not having real hash pages like the 603. Lets enlarge the content of the feature fixup, and provide an alternative which jumps directly instead of getting NIPs. Also, in preparation of VMAP stacks, the end of DSI handler has moved to later in the code as it won't fit anymore once VMAP stacks are there. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c31b22c91af8b011d0a4fd9e52ad6afb4b593f71.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/8xx: Enable CONFIG_VMAP_STACKChristophe Leroy
This patch enables CONFIG_VMAP_STACK. For that, a few changes are done in head_8xx.S. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d7ba1e34e80898310d6a314cbebe48baa32894ef.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/8xx: Move tail of alignment exception out of lineMichael Ellerman
When we enable VMAP_STACK there will not be enough room for the alignment handler at 0x600 in head_8xx.S. For now move the tail of the alignment handler out of line, and branch to it. Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-01-27powerpc/8xx: Split breakpoint exceptionChristophe Leroy
Breakpoint exception is big. Split it to support future growth on exception prolog. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1dda3293d86d0f715b13b2633c95d2188a42a02c.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/8xx: Move DataStoreTLBMiss perf handlerChristophe Leroy
Move DataStoreTLBMiss perf handler in order to cope with future growing exception prolog. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/75dd28b04efd2cbdbf01153173d99c11cdff2f08.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/8xx: Drop exception entries for non-existing exceptionsChristophe Leroy
head_8xx.S has entries for all exceptions from 0x100 to 0x1f00. Several of them do not exist and are never generated by the 8xx in accordance with the documentation. Remove those entry points to make some room for future growing exception code. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/66f92866fe9524cf0f056016921c7d53adaef3a0.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/8xx: Use alternative scratch registers in DTLB miss handlerChristophe Leroy
In preparation of handling CONFIG_VMAP_STACK, DTLB miss handler need to use different scratch registers than other exception handlers in order to not jeopardise exception entry on stack DTLB misses. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c5287ea59ae9630f505019b309bf94029241635f.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/32: Use vmapped stacks for interruptsChristophe Leroy
In order to also catch overflows on IRQ stacks, use vmapped stacks. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d33ad1b36ddff4dcc19f96c592c12a61cf85d406.1576916812.git.christophe.leroy@c-s.fr
2020-01-27powerpc/32: Add early stack overflow detection with VMAP stack.Christophe Leroy
To avoid recursive faults, stack overflow detection has to be performed before writing in the stack in exception prologs. Do it by checking the alignment. If the stack pointer alignment is wrong, it means it is pointing to the following or preceding page. Without VMAP stack, a stack overflow is catastrophic. With VMAP stack, a stack overflow isn't destructive, so don't panic. Kill the task with SIGSEGV instead. A dedicated overflow stack is set up for each CPU. lkdtm: Performing direct entry EXHAUST_STACK lkdtm: Calling function with 512 frame size to depth 32 ... lkdtm: loop 32/32 ... lkdtm: loop 31/32 ... lkdtm: loop 30/32 ... lkdtm: loop 29/32 ... lkdtm: loop 28/32 ... lkdtm: loop 27/32 ... lkdtm: loop 26/32 ... lkdtm: loop 25/32 ... lkdtm: loop 24/32 ... lkdtm: loop 23/32 ... lkdtm: loop 22/32 ... lkdtm: loop 21/32 ... lkdtm: loop 20/32 ... Kernel stack overflow in process test[359], r1=c900c008 Oops: Kernel stack overflow, sig: 6 [#1] BE PAGE_SIZE=4K MMU=Hash PowerMac Modules linked in: CPU: 0 PID: 359 Comm: test Not tainted 5.3.0-rc7+ #2225 NIP: c0622060 LR: c0626710 CTR: 00000000 REGS: c0895f48 TRAP: 0000 Not tainted (5.3.0-rc7+) MSR: 00001032 <ME,IR,DR,RI> CR: 28004224 XER: 00000000 GPR00: c0626ca4 c900c008 c783c000 c07335cc c900c010 c07335cc c900c0f0 c07335cc GPR08: c900c0f0 00000001 00000000 00000000 28008222 00000000 00000000 00000000 GPR16: 00000000 00000000 10010128 10010000 b799c245 10010158 c07335cc 00000025 GPR24: c0690000 c08b91d4 c068f688 00000020 c900c0f0 c068f668 c08b95b4 c08b91d4 NIP [c0622060] format_decode+0x0/0x4d4 LR [c0626710] vsnprintf+0x80/0x5fc Call Trace: [c900c068] [c0626ca4] vscnprintf+0x18/0x48 [c900c078] [c007b944] vprintk_store+0x40/0x214 [c900c0b8] [c007bf50] vprintk_emit+0x90/0x1dc [c900c0e8] [c007c5cc] printk+0x50/0x60 [c900c128] [c03da5b0] recursive_loop+0x44/0x6c [c900c338] [c03da5c4] recursive_loop+0x58/0x6c [c900c548] [c03da5c4] recursive_loop+0x58/0x6c [c900c758] [c03da5c4] recursive_loop+0x58/0x6c [c900c968] [c03da5c4] recursive_loop+0x58/0x6c [c900cb78] [c03da5c4] recursive_loop+0x58/0x6c [c900cd88] [c03da5c4] recursive_loop+0x58/0x6c [c900cf98] [c03da5c4] recursive_loop+0x58/0x6c [c900d1a8] [c03da5c4] recursive_loop+0x58/0x6c [c900d3b8] [c03da5c4] recursive_loop+0x58/0x6c [c900d5c8] [c03da5c4] recursive_loop+0x58/0x6c [c900d7d8] [c03da5c4] recursive_loop+0x58/0x6c [c900d9e8] [c03da5c4] recursive_loop+0x58/0x6c [c900dbf8] [c03da5c4] recursive_loop+0x58/0x6c [c900de08] [c03da67c] lkdtm_EXHAUST_STACK+0x30/0x4c [c900de18] [c03da3e8] direct_entry+0xc8/0x140 [c900de48] [c029fb40] full_proxy_write+0x64/0xcc [c900de68] [c01500f8] __vfs_write+0x30/0x1d0 [c900dee8] [c0152cb8] vfs_write+0xb8/0x1d4 [c900df08] [c0152f7c] ksys_write+0x58/0xe8 [c900df38] [c0014208] ret_from_syscall+0x0/0x34 --- interrupt: c01 at 0xf806664 LR = 0x1000c868 Instruction dump: 4bffff91 80010014 7c832378 7c0803a6 38210010 4e800020 3d20c08a 3ca0c089 8089a0cc 38a58f0c 38600001 4ba2d494 <9421ffe0> 7c0802a6 bfc10018 7c9f2378 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1b89c121b4070c7ee99e4f22cc178f15a736b07b.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc: align stack to 2 * THREAD_SIZE with VMAP_STACKChristophe Leroy
In order to ease stack overflow detection, align stack to 2 * THREAD_SIZE when using VMAP_STACK. This allows overflow detection using a single bit check. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/60e9ae86b7d2cdcf21468787076d345663648f46.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/32: prepare for CONFIG_VMAP_STACKChristophe Leroy
To support CONFIG_VMAP_STACK, the kernel has to activate Data MMU Translation for accessing the stack. Before doing that it must save SRR0, SRR1 and also DAR and DSISR when relevant, in order to not loose them in case there is a Data TLB Miss once the translation is reactivated. This patch adds fields in thread struct for saving those registers. It prepares entry_32.S to handle exception entry with Data MMU Translation enabled and alters EXCEPTION_PROLOG macros to save SRR0, SRR1, DAR and DSISR then reenables Data MMU. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a775a1fea60f190e0f63503463fb775310a2009b.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/32: add a macro to get and/or save DAR and DSISR on stack.Christophe Leroy
Refactor reading and saving of DAR and DSISR in exception vectors. This will ease the implementation of VMAP stack. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1286b3e51b07727c6b4b05f2df9af3f9b1717fb5.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/32: move MSR_PR test into EXCEPTION_PROLOG_0Christophe Leroy
In order to simplify VMAP stack implementation, move MSR_PR test into EXCEPTION_PROLOG_0. This requires to not modify cr0 between EXCEPTION_PROLOG_0 and EXCEPTION_PROLOG_1. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5c8b5bba692b92654dbd363a229a1ba91db725bb.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/32: save DEAR/DAR before calling handle_page_faultChristophe Leroy
handle_page_fault() is the only function that save DAR/DEAR itself. Save DAR/DEAR before calling handle_page_fault() to prepare for VMAP stack which will require to save even before. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3a4d58d378091086f00fde42b59610c80289e120.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/32: Add EXCEPTION_PROLOG_0 in head_32.hChristophe Leroy
This patch creates a macro for the very first part of exception prolog, this will help when implementing CONFIG_VMAP_STACK Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2249fe62f481121a180e9655ad2b998093f318f3.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/32: replace MTMSRD() by mtmsrChristophe Leroy
On PPC32, MTMSRD() is simply defined as mtmsr. Replace MTMSRD(reg) by mtmsr reg in files dedicated to PPC32, this makes the code less obscure. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/22469e78230edea3dbd0c79a555d73124f6c6d93.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc/mm: Remove kvm radix prefetch workaround for Power9 DD2.2Jordan Niethe
Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM") introduced a number of workarounds as coming out of a guest with the mmu enabled would make the cpu would start running in hypervisor state with the PID value from the guest. The cpu will then start prefetching for the hypervisor with that PID value. In Power9 DD2.2 the cpu behaviour was modified to fix this. When accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will not be performed. This means that we can get rid of the workarounds for Power9 DD2.2 and later revisions. Add a new cpu feature CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191206031722.25781-1-jniethe5@gmail.com
2020-01-26powerpc: use probe_user_read() and probe_user_write()Christophe Leroy
Instead of opencoding, use probe_user_read() to failessly read a user location and probe_user_write() for writing to user. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
2020-01-23powerpc/pci: Fold pcibios_setup_device() into pcibios_bus_add_device()Oliver O'Halloran
pcibios_bus_add_device() is the only caller of pcibios_setup_device(). Fold them together since there's no real reason to keep them separate. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-2-oohall@gmail.com
2020-01-23powerpc/eeh: Only dump stack once if an MMIO loop is detectedOliver O'Halloran
Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
2020-01-23powerpc/pcidn: Warn when sriov pci_dn management is used incorrectlyOliver O'Halloran
These functions can only be used on a SR-IOV capable physical function and they're only called in pcibios_sriov_enable / disable. Make them emit a warning in the future if they're used incorrectly and remove the dead code that checks if the device is a VF. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-3-oohall@gmail.com
2020-01-23powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specificOliver O'Halloran
The powerpc PCI code requires that a pci_dn structure exists for all devices in the system. This is fine for real devices since at boot a pci_dn is created for each PCI device in the DT and it's fine for hotplugged devices since the hotplug slot driver will manage the pci_dn's devices in hotplug slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for virtual functions since firmware is unaware of VFs, and they aren't "hot plugged" in the traditional sense. Management of the pci_dn is handled by the, poorly named, functions: add_pci_dev_data() and remove_pci_dev_data(). The entire body of these functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used in any other context, so make them only available when CONFIG_PCI_IOV is selected, and rename them to reflect their actual usage rather than having them masquerade as generic code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-2-oohall@gmail.com
2020-01-23powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOVOliver O'Halloran
When disabling virtual functions on an SR-IOV adapter we currently do not correctly remove the EEH state for the now-dead virtual functions. When removing the pci_dn that was created for the VF when SR-IOV was enabled we free the corresponding eeh_dev without removing it from the child device list of the eeh_pe that contained it. This can result in crashes due to the use-after-free. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-1-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Make clearing EEH_DEV_SYSFS sanerOliver O'Halloran
The eeh_sysfs_remove_device() function is supposed to clear the EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added for a pci_dev. When the sysfs files are removed eeh_remove_device() the eeh_dev and the pci_dev have already been de-associated. This then causes the pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so the flag can't be cleared from the still-live eeh_dev. This problem is worked around in the caller by clearing the flag manually. However, this behaviour doesn't make a whole lot of sense, so this patch fixes it by: a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is called before de-associating the pci_dev and eeh_dev. b) Making eeh_sysfs_remove_device() emit a warning if there's no corresponding eeh_dev for a pci_dev. The paths where the sysfs files are only reachable if EEH was setup for the device for the device in the first place so hitting this warning indicates a programming error. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-6-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Remove double pci_dn lookup.Oliver O'Halloran
In eeh_notify_resume_show() the pci_dn for the device is looked up once in the declaration block and then once after checking for a NULL eeh_dev. Remove the second lookup since it's pointless. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-5-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: ifdef pseries sr-iov sysfs propertiesOliver O'Halloran
There are several EEH sysfs properties that only exists when the "ibm,is-open-sriov-pf" property appears in the device tree node of the PCI device. This used on pseries to indicate to the guest that the hypervisor allows the guest to configure the SR-IOV capability. Doing this requires some handshaking between the guest, hypervisor and userspace when a VF is EEH frozen which is why these properties exist. This is all dead code on non-pseries platforms so wrap it in an #ifdef CONFIG_PPC_PSERIES to make the dependency clearer. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-4-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Fix incorrect commentOliver O'Halloran
The EEH_ATTR_SHOW() helper is used to display fields from struct eeh_dev not struct pci_dn. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-3-oohall@gmail.com
2020-01-23powerpc/eeh_cache: Don't use pci_dn when inserting new rangesOliver O'Halloran
At the point where we start inserting ranges into the EEH address cache the binding between pci_dev and eeh_dev has already been set up. Instead of consulting the pci_dn tree we can retrieve the eeh_dev directly using pci_dev_to_eeh_dev(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-2-oohall@gmail.com
2020-01-23powerpc/vdso32: miscellaneous optimisationsChristophe Leroy
Various optimisations by inverting branches and removing redundant instructions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b4e79f963845545bcce1459cd6fcfe46bdde7863.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: implement clock_getres entirelyChristophe Leroy
clock_getres returns hrtimer_res for all clocks but coarse ones for which it returns KTIME_LOW_RES. return EINVAL for unknown clocks. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/37f94e47c91070b7606fb3ec3fe6fd2302a475a0.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: use LOAD_REG_IMMEDIATE()Christophe Leroy
Use LOAD_REG_IMMEDIATE() to load registers with immediate value. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/36f111437e66e601929308f5d5dce230e1ce472f.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: Don't read cache line size from the datapage on PPC32.Christophe Leroy
On PPC32, the cache lines have a fixed size known at build time. Don't read it from the datapage. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dfa7b35e27e01964fcda84bf1ed8b2b31cf93826.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: inline __get_datapage()Christophe Leroy
__get_datapage() is only a few instructions to retrieve the address of the page where the kernel stores data to the VDSO. By inlining this function into its users, a bl/blr pair and a mflr/mtlr pair is avoided, plus a few reg moves. The improvement is noticeable (about 55 nsec/call on an 8xx) vdsotest before the patch: gettimeofday: vdso: 731 nsec/call clock-gettime-realtime-coarse: vdso: 668 nsec/call clock-gettime-monotonic-coarse: vdso: 745 nsec/call vdsotest after the patch: gettimeofday: vdso: 677 nsec/call clock-gettime-realtime-coarse: vdso: 613 nsec/call clock-gettime-monotonic-coarse: vdso: 690 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c39ef7f3dfa25356b01e211d539671f279086c09.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSEChristophe Leroy
This is copied and adapted from commit 5c929885f1bb ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE") from Santosh Sivaraj <santosh@fossix.org> Benchmark from vdsotest-all: clock-gettime-realtime: syscall: 3601 nsec/call clock-gettime-realtime: libc: 1072 nsec/call clock-gettime-realtime: vdso: 931 nsec/call clock-gettime-monotonic: syscall: 4034 nsec/call clock-gettime-monotonic: libc: 1213 nsec/call clock-gettime-monotonic: vdso: 1076 nsec/call clock-gettime-realtime-coarse: syscall: 2722 nsec/call clock-gettime-realtime-coarse: libc: 805 nsec/call clock-gettime-realtime-coarse: vdso: 668 nsec/call clock-gettime-monotonic-coarse: syscall: 2949 nsec/call clock-gettime-monotonic-coarse: libc: 882 nsec/call clock-gettime-monotonic-coarse: vdso: 745 nsec/call Additional test passed with: vdsotest -d 30 clock-gettime-monotonic-coarse verify Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/linuxppc/issues/issues/41 Link: https://lore.kernel.org/r/d1d24a376e396540194eeb85a2efe481e92ade24.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/32: Add VDSO version of getcpu on non SMPChristophe Leroy
Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added getcpu() for PPC64 only, by making use of a user readable general purpose SPR. PPC32 doesn't have any such SPR. For non SMP, just return CPU id 0 from the VDSO directly. PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0. Before the patch, vdsotest reported: getcpu: syscall: 1572 nsec/call getcpu: libc: 1787 nsec/call getcpu: vdso: not tested Now, vdsotest reports: getcpu: syscall: 1582 nsec/call getcpu: libc: 502 nsec/call getcpu: vdso: 187 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eaac4b6494ecff1811220fccc895bf282aab884a.1575273217.git.christophe.leroy@c-s.fr
2020-01-23powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size.Christophe Leroy
Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but it has a breakpoint support based on a set of comparators which allow more flexibility. Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint") implemented breakpoints by emulating the DABR behaviour. It did this by setting one comparator the match 4 bytes at breakpoint address and the other comparator to match 4 bytes at breakpoint address + 4. Rewrite 8xx hw_breakpoint to make breakpoints match all addresses defined by the breakpoint address and length by making full use of comparators. Now, comparator E is set to match any address greater than breakpoint address minus one. Comparator F is set to match any address lower than breakpoint address plus breakpoint length. Addresses are aligned to 32 bits. When the breakpoint range starts at address 0, the breakpoint is set to match comparator F only. When the breakpoint range end at address 0xffffffff, the breakpoint is set to match comparator E only. Otherwise the breakpoint is set to match comparator E and F. At the same time, use registers bit names instead of hardcoded values. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05105deeaf63bc02151aea2cdeaf525534e0e9d4.1574790198.git.christophe.leroy@c-s.fr
2020-01-18open: introduce openat2(2) syscallAleksa Sarai
/* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). It also only contains u64s (even though ->mode arguably should be a u16) to avoid having padding fields which are never used in the future. Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). After input from Florian Weimer, the new open_how and flag definitions are inside a separate header from uapi/linux/fcntl.h, to avoid problems that glibc has with importing that header. /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. Another possible avenue of future work would be some kind of CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace which openat2(2) flags and fields are supported by the current kernel (to avoid userspace having to go through several guesses to figure it out). [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ [6]: https://youtu.be/ggD-eb3yPVs Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-16powerpc/64s: Reimplement power4_idle code in CNicholas Piggin
This implements the tricky tracing and soft irq handling bits in C, leaving the low level bit to asm. A functional difference is that this redirects the interrupt exit to a return stub to execute blr, rather than the lr address itself. This is probably barely measurable on real hardware, but it keeps the link stack balanced. Tested with QEMU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move power4_fixup_nap back into exceptions-64s.S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190711022404.18132-1-npiggin@gmail.com
2020-01-14arch/powerpc/setup: Drop dummy_con initializationArvind Sankar
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset. Drop it from arch setup code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20191218214506.49252-18-nivedita@alum.mit.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-06powerpc/pci: Remove pcibios_setup_bus_devices()Oliver O'Halloran
With the previous patch applied pcibios_setup_device() will always be run when pcibios_bus_add_device() is called. There are several code paths where pcibios_setup_bus_device() is still called (the PowerPC specific PCI hotplug support is one) so with just the previous patch applied the setup can be run multiple times on a device, once before the device is added to the bus and once after. There's no need to run the setup in the early case any more so just remove it entirely. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191028085424.12006-3-oohall@gmail.com
2020-01-06powerpc/pci: Fix pcibios_setup_device() orderingShawn Anastasio
Move PCI device setup from pcibios_add_device() and pcibios_fixup_bus() to pcibios_bus_add_device(). This ensures that platform-specific DMA and IOMMU setup occurs after the device has been registered in sysfs, which is a requirement for IOMMU group assignment to work This fixes IOMMU group assignment for hotplugged devices on pseries, where the existing behavior results in IOMMU assignment before registration. Thanks to Lukas Wunner <lukas@wunner.de> for the suggestion. Signed-off-by: Shawn Anastasio <shawn@anastas.io> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191028085424.12006-2-oohall@gmail.com
2020-01-06powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE numberOliver O'Halloran
On pseries there is a bug with adding hotplugged devices to an IOMMU group. For a number of dumb reasons fixing that bug first requires re-working how VFs are configured on PowerNV. For background, on PowerNV we use the pcibios_sriov_enable() hook to do two things: 1. Create a pci_dn structure for each of the VFs, and 2. Configure the PHB's internal BARs so the MMIO range for each VF maps to a unique PE. Roughly speaking a PE is the hardware counterpart to a Linux IOMMU group since all the devices in a PE share the same IOMMU table. A PE also defines the set of devices that should be isolated in response to a PCI error (i.e. bad DMA, UR/CA, AER events, etc). When isolated all MMIO and DMA traffic to and from devicein the PE is blocked by the root complex until the PE is recovered by the OS. The requirement to block MMIO causes a giant headache because the P8 PHB generally uses a fixed mapping between MMIO addresses and PEs. As a result we need to delay configuring the IOMMU groups for device until after MMIO resources are assigned. For physical devices (i.e. non-VFs) the PE assignment is done in pcibios_setup_bridge() which is called immediately after the MMIO resources for downstream devices (and the bridge's windows) are assigned. For VFs the setup is more complicated because: a) pcibios_setup_bridge() is not called again when VFs are activated, and b) The pci_dev for VFs are created by generic code which runs after pcibios_sriov_enable() is called. The work around for this is a two step process: 1. A fixup in pcibios_add_device() is used to initialised the cached pe_number in pci_dn, then 2. A bus notifier then adds the device to the IOMMU group for the PE specified in pci_dn->pe_number. A side effect fixing the pseries bug mentioned in the first paragraph is moving the fixup out of pcibios_add_device() and into pcibios_bus_add_device(), which is called much later. This results in step 2. failing because pci_dn->pe_number won't be initialised when the bus notifier is run. We can fix this by removing the need for the fixup. The PE for a VF is known before the VF is even scanned so we can initialise pci_dn->pe_number pcibios_sriov_enable() instead. Unfortunately, moving the initialisation causes two problems: 1. We trip the WARN_ON() in the current fixup code, and 2. The EEH core clears pdn->pe_number when recovering a VF and relies on the fixup to correctly re-set it. The only justification for either of these is a comment in eeh_rmv_device() suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order for the VF to be scanned. However, this comment appears to have no basis in reality. Both bugs can be fixed by just deleting the code. Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191028085424.12006-1-oohall@gmail.com
2020-01-06powerpc/64: Use {SAVE,REST}_NVGPRS macrosJordan Niethe
In entry_64.S there are places that open code saving and restoring the non-volatile registers. There are already macros for doing this so use them. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191211023552.16480-1-jniethe5@gmail.com
2019-12-25Merge tag 'v5.5-rc3' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-23Merge 5.5-rc3 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-18tty/serial: Migrate 8250_fsl to use has_sysrqDmitry Safonov
The SUPPORT_SYSRQ ifdeffery is not nice as: - May create misunderstanding about sizeof(struct uart_port) between different objects - Prevents moving functions from serial_core.h - Reduces readability (well, it's ifdeffery - it's hard to follow) In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added. Initialise it in driver's probe and remove ifdeffery. In contrast to 8250/8250_of, legacy_serial on powerpc does fill (struct plat_serial8250_port). The reason is likely that it's done on device_initcall(), not on probe. So, 8250_core is not yet probed. Propagate value from platform_device on 8250 probe - in case powepc legacy driver it's initialized on initcall, in case 8250_of it will be initialized later on of_platform_serial_setup(). Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20191213000657.931618-6-dima@arista.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-14powerpc/irq: fix stack overflow verificationChristophe Leroy
Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack"), check_stack_overflow() was called by do_IRQ(), before switching to the irq stack. In that commit, do_IRQ() was renamed __do_irq(), and is now executing on the irq stack, so check_stack_overflow() has just become almost useless. Move check_stack_overflow() call in do_IRQ() to do the check while still on the current stack. Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.fr