summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2020-07-30powerpc: fix function annotations to avoid section mismatch warnings with gcc-10Vladis Dronov
Certain warnings are emitted for powerpc code when building with a gcc-10 toolset: WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in reference from the function remove_pmd_table() to the function .meminit.text:split_kernel_mapping() The function remove_pmd_table() references the function __meminit split_kernel_mapping(). This is often because remove_pmd_table lacks a __meminit annotation or the annotation of split_kernel_mapping is wrong. Add the appropriate __init and __meminit annotations to make modpost not complain. In all the cases there are just a single callsite from another __init or __meminit function: __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table() __init prom_init() -> setup_secure_guest() __init xive_spapr_init() -> xive_spapr_disabled() Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200729133741.62789-1-vdronov@redhat.com
2020-07-29powerpc/drmem: Make LMB walk a bit more flexibleHari Bathini
Currently, numa & prom are the only users of drmem LMB walk code. Loading kdump with kexec_file also needs to walk the drmem LMBs to setup the usable memory ranges for kdump kernel. But there are couple of issues in using the code as is. One, walk_drmem_lmb() code is built into the .init section currently, while kexec_file needs it later. Two, there is no scope to pass data to the callback function for processing and/or erroring out on certain conditions. Fix that by, moving drmem LMB walk code out of .init section, adding scope to pass data to the callback function and bailing out when an error is encountered in the callback function. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602282727.575379.3979857013827701828.stgit@hbathini
2020-07-29powerpc/64s: Move HMI IRQ stat from percpu variable to paca.Mahesh Salgaonkar
With the proposed change in percpu bootmem allocator to use page mapping [1], the percpu first chunk memory area can come from vmalloc ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler crash the kernel whenever percpu variable is accessed in real mode. This patch fixes this issue by moving the HMI IRQ stat inside paca for safe access in realmode. [1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/ Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter
2020-07-29powerpc/build: vdso linker warning for orphan sectionsNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200303012748.4190929-1-npiggin@gmail.com
2020-07-29powerpc: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727224201.GA10133@embeddedor
2020-07-29powerpc/book3s64/radix: Add kernel command line option to disable radix GTSEAneesh Kumar K.V
This adds a kernel command line option that can be used to disable GTSE support. Disabling GTSE implies kernel will make hcalls to invalidate TLB entries. This was done so that we can do VM migration between configs that enable/disable GTSE support via hypervisor. To migrate a VM from a system that supports GTSE to a system that doesn't, we can boot the guest with radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB invalidates. The check for hcall availability is done in pSeries_setup_arch so that the panic message appears on the console. This should only happen on a hypervisor that doesn't force the guest to hash translation even though it can't handle the radix GTSE=0 request via CAS. With radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate hcall it should force the LPAR to hash translation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com
2020-07-29powerpc/hugetlb/cma: Allocate gigantic hugetlb pages using CMAAneesh Kumar K.V
commit: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") added support for allocating gigantic hugepages using CMA. This patch enables the same for powerpc Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713150749.25245-1-aneesh.kumar@linux.ibm.com
2020-07-29powerpc/32s: Remove TAUException wart in traps.cMichael Ellerman
All 32 and 64-bit builds that don't have CONFIG_TAU_INT enabled (all of them), get a definition of TAUException() in traps.c. On 64-bit it's completely useless, and just wastes ~120 bytes of text. On 32-bit it allows the kernel to link because head_32.S calls it unconditionally. Instead follow the example of altivec_assist_exception(), and if CONFIG_TAU_INT is not enabled just point it at unknown_exception using the preprocessor. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131728.1643966-6-mpe@ellerman.id.au
2020-07-29powerpc/pseries: Add KVM guest doorbell restrictionsNicholas Piggin
KVM guests have certain restrictions and performance quirks when using doorbells. This patch moves the EPAPR KVM guest test so it can be shared with PSERIES, and uses that in doorbell setup code to apply the KVM guest quirks and improves IPI performance for two cases: - PowerVM guests may now use doorbells even if they are secure. - KVM guests no longer use doorbells if XIVE is available. There is a valid complaint that "KVM guest" is not a very reasonable thing to test for, it's preferable for the hypervisor to advertise particular behaviours to the guest so they could change if the hypervisor implementation or configuration changes. However in this case we were already assuming a KVM guest worst case, so this patch is about containing those quirks. If KVM later advertises fast doorbells, we should test for that and override the quirks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com
2020-07-29powerpc: Inline doorbell sending functionsNicholas Piggin
These are only called in one place for a given platform, so inline them for performance. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Cédric Le Goater <clg@kaod.org> [mpe: Fix build errors related to KVM] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200726035155.1424103-2-npiggin@gmail.com
2020-07-27powerpc: switch to ->regset_get()Al Viro
Note: compat variant of REGSET_TM_CGPR is almost certainly wrong; it claims to be 48*64bit, but just as compat REGSET_GPR it stores 44*32bit of (truncated) registers + 4 32bit zeros... followed by 48 more 32bit zeroes. Might be too late to change - it's a userland ABI, after all ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=yMichael Ellerman
skiroot_defconfig fails: arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used 48 | static atomic_t cpus_in_fadump; Fix it by moving the definition into the #ifdef where it's used. Fixes: ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au
2020-07-27powerpc/64s/hash: Fix hash_preload running with interrupts enabledNicholas Piggin
Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
2020-07-27powerpc/32s: Kernel space starts at TASK_SIZEChristophe Leroy
Kernel space starts at TASK_SIZE. Select kernel page table when address is over TASK_SIZE. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu
2020-07-27powerpc: Use MODULES_VADDR if definedChristophe Leroy
In order to allow allocation of modules outside of vmalloc space, use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined. Redefine module_alloc() when MODULES_VADDR defined. Unmap corresponding KASAN shadow memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu
2020-07-26powerpc/perf: Initialize power10 PMU registers in cpu setup routineAthira Rajeev
Initialize Monitor Mode Control Register 3 (MMCR3) SPR which is new in power10. For PowerISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch also initializes MMCRA BHRBRD to disable BHRB feature at boot for power10. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1595489557-2047-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-26powerpc/eeh: Move PE tree setup into the platformOliver O'Halloran
The EEH core has a concept of a "PE tree" to support PowerNV. The PE tree follows the PCI bus structures because a reset asserted on an upstream bridge will be propagated to the downstream bridges. On pseries there's a 1-1 correspondence between what the guest sees are a PHB and a PE so the "tree" is really just a single node. Current the EEH core is reponsible for setting up this PE tree which it does by traversing the pci_dn tree. The structure of the pci_dn tree matches the bus tree on PowerNV which leads to the PE tree being "correct" this setup method doesn't make a whole lot of sense and it's actively confusing for the pseries case where it doesn't really do anything. We want to remove the dependence on pci_dn anyway so this patch move choosing where to insert a new PE into the platform code rather than being part of the generic EEH code. For PowerNV this simplifies the tree building logic and removes the use of pci_dn. For pseries we keep the existing logic. I'm not really convinced it does anything due to the 1-1 PE-to-PHB correspondence so every device under that PHB should be in the same PE, but I'd rather not remove it entirely until we've had a chance to look at it more deeply. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-14-oohall@gmail.com
2020-07-26powerpc/eeh: Drop pdn use in eeh_pe_tree_insert()Oliver O'Halloran
This is mostly just to make the subsequent diffs less noisy. No functional changes. One thing that needs calling out is the removal of the "config_addr" variable and replacing it with edev->bdfn. The contents of edev->bdfn are the same, however it's worth pointing out that what RTAS calls a "config_addr" isn't the same as the bdfn. The config_addr is supposed to be: <bus><devfn><reg> with each field being an 8 bit number. Various parts of the EEH code use BDFN and "config_addr" as interchangeable quantities even though they aren't really. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-13-oohall@gmail.com
2020-07-26powerpc/eeh: Rename eeh_{add_to|remove_from}_parent_pe()Oliver O'Halloran
The naming of eeh_{add_to|remove_from}_parent_pe() doesn't really reflect what they actually do. If the PE referred to be edev->pe_config_addr already exists under that PHB then the edev is added to that PE. However, if the PE doesn't exist the a new one is created for the edev. The bulk of the implementation of eeh_add_to_parent_pe() covers that second case. Similarly, most of eeh_remove_from_parent_pe() is determining when it's safe to delete a PE. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-12-oohall@gmail.com
2020-07-26powerpc/eeh: Remove spurious use of pci_dn in eeh_dump_dev_logOliver O'Halloran
Retrieve the domain, bus, device, and function numbers from the edev. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-10-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->{read|write}_config()Oliver O'Halloran
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-9-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->resume_notify()Oliver O'Halloran
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-8-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->restore_config()Oliver O'Halloran
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-7-oohall@gmail.com
2020-07-26powerpc/eeh: Remove VF config space restorationOliver O'Halloran
There's a bunch of strange things about this code. First up is that none of the fields being written to are functional for a VF. The SR-IOV specification lists then as "Reserved, but OS should preserve" so writing new values to them doesn't do anything and is clearly wrong from a correctness perspective. However, since VFs are designed to be managed by the OS there is an argument to be made that we should be saving and restoring some parts of config space. We already sort of do that by saving the first 64 bytes of config space in the eeh_dev (see eeh_dev->config_space[]). This is inadequate since it doesn't even consider saving and restoring the PCI capability structures. However, this is a problem with EEH in general and that needs to be fixed for non-VF devices too. There's no real reason to keep around this around so delete it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-6-oohall@gmail.com
2020-07-26powerpc/eeh: Move vf_index out of pci_dn and into eeh_devOliver O'Halloran
Drivers that do not support the PCI error handling callbacks are handled by tearing down the device and re-probing them. If the device being removed is a virtual function then we need to know the VF index so it can be removed using the pci_iov_{add|remove}_virtfn() API. Currently this is handled by looking up the pci_dn, and using the vf_index that was stashed there when the pci_dn for the VF was created in pcibios_sriov_enable(). We would like to eliminate the use of pci_dn outside of pseries though so we need to provide the generic EEH code with some other way to find the vf_index. The easiest thing to do here is move the vf_index field out of pci_dn and into eeh_dev. Currently pci_dn and eeh_dev are allocated and initialized together so this is a fairly minimal change in preparation for splitting pci_dn and eeh_dev in the future. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-3-oohall@gmail.com
2020-07-26powerpc/eeh: Remove eeh_dev.cOliver O'Halloran
The only thing in this file is eeh_dev_init() which is allocates and initialises an eeh_dev based on a pci_dn. This is only ever called from pci_dn.c so move it into there and remove the file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-2-oohall@gmail.com
2020-07-26powerpc/eeh: Remove eeh_dev_phb_init_dynamic()Oliver O'Halloran
This function is a one line wrapper around eeh_phb_pe_create() and despite the name it doesn't create any eeh_dev structures. Replace it with direct calls to eeh_phb_pe_create() since that does what it says on the tin and removes a layer of indirection. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-1-oohall@gmail.com
2020-07-26powerpc/watchpoint: Remove 512 byte boundaryRavi Bangoria
Power10 has removed 512 bytes boundary from match criteria i.e. the watch range can cross 512 bytes boundary. Note: ISA 3.1 Book III 9.4 match criteria includes 512 byte limit but that is a documentation mistake and hopefully will be fixed in the next version of ISA. Though, ISA 3.1 change log mentions about removal of 512B boundary: Multiple DEAW: Added a second Data Address Watchpoint. [H]DAR is set to the first byte of overlap. 512B boundary is removed. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-11-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Guest support for 2nd DAWR hcallRavi Bangoria
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5. Enable powervm guest support with that. This has no effect on kvm guest because kvm will return error if guest does hcall with resource value 5. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-9-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bitRavi Bangoria
As per the PAPR, bit 0 of byte 64 in pa-features property indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd DAWR is present, otherwise not. Host generally uses "cpu-features", which masks "pa-features". But "cpu-features" are still not used for guests and thus this change is mostly applicable for guests only. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-7-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/dt_cpu_ftrs: Add feature for 2nd DAWRRavi Bangoria
Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-6-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix DAWR exception for CACHEOPRavi Bangoria
'ea' returned by analyse_instr() needs to be aligned down to cache block size for CACHEOP instructions. analyse_instr() does not set size for CACHEOP, thus size also needs to be calculated manually. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-4-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix DAWR exception constraintRavi Bangoria
Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is inconsistent with different type of load/store. Like for byte,word etc. load/stores, DAR is set to the address of the first byte of overlap between watch range and real access. But for quadword load/ store it's sometime set to the address of the first byte of real access whereas sometime set to the address of the first byte of overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is always set to the address of the first byte of overlap. Commit 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") wrongly assumes that DAR is set to the address of the first byte of overlap for all load/stores on p8/p9 as well. Fix that. With the fix, we now rely on 'ea' provided by analyse_instr(). If analyse_instr() fails, generate event unconditionally on p8/p9, and on p10 generate event only if DAR is within a DAWR range. Note: 8xx is not affected. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-3-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix 512 byte boundary limitRavi Bangoria
Milton Miller reported that we are aligning start and end address to wrong size SZ_512M. It should be SZ_512. Fix that. While doing this change I also found a case where ALIGN() comparison fails. Within a given aligned range, ALIGN() of two addresses does not match when start address is pointing to the first byte and end address is pointing to any other byte except the first one. But that's not true for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range will always point to the first byte. So use ALIGN_DOWN() instead of ALIGN(). Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-2-ravi.bangoria@linux.ibm.com
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25Merge tag 'v5.8-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-23Merge branch 'scv' support into nextMichael Ellerman
From Nick's cover letter: Linux powerpc new system call instruction and ABI System Call Vectored (scv) ABI ============================== The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 levels (not enough to cover the Linux system call space). Assignment and advertisement ---------------------------- The proposal is to assign scv levels conservatively, and advertise them with HWCAP feature bits as we add support for more. Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will cause the kernel to log a "SCV facility unavilable" message, and deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. This change allocates the zero level ('scv 0'), advertised with PPC_FEATURE2_SCV, which will be used to provide normal Linux system calls (equivalent to 'sc'). Attempting to execute scv with other levels will cause a SIGILL to be delivered the same as before, but will not log a "SCV facility unavailable" message (because the processor facility is enabled). Calling convention ------------------ The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - LR is to be volatile across scv calls. This is necessary because the scv instruction clobbers LR. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the volatile cr registers (although we probably still would anyway to avoid information leaks). - Error handling: The consensus among kernel, glibc, and musl is to move to using negative return values in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures, and is closer to a function call. Notes ----- - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as submitted currently preserves them. This is to leave room for deciding which way to go with these. Some small benefit was found by preserving them[1] but I'm not convinced it's worth deviating from the C function call ABI just for this. Release code should follow the ABI. Previous discussions: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
2020-07-23powerpc/powernv: Machine check handler for POWER10Nicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200702233343.1128026-1-npiggin@gmail.com
2020-07-23powerpc: Select ARCH_HAS_MEMBARRIER_SYNC_CORENicholas Piggin
powerpc return from interrupt and return from system call sequences are context synchronising. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200716013522.338318-1-npiggin@gmail.com
2020-07-23powerpc/64: Fix an out of date comment about MMIO orderingPalmer Dabbelt
This primitive has been renamed, but because it was spelled incorrectly in the first place it must have escaped the fixup patch. As far as I can tell this logic is still correct: smp_mb__after_spinlock() uses the default smp_mb() implementation, which is "sync" rather than "hwsync" but those are the same (though I'm not that familiar with PowerPC). Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200716193820.1141936-1-palmer@dabbelt.com
2020-07-23powerpc: Add a ppc_inst_as_str() helperJordan Niethe
There are quite a few places where instructions are printed, this is done using a '%x' format specifier. With the introduction of prefixed instructions, this does not work well. Currently in these places, ppc_inst_val() is used for the value for %x so only the first word of prefixed instructions are printed. When the instructions are word instructions, only a single word should be printed. For prefixed instructions both the prefix and suffix should be printed. To accommodate both of these situations, instead of a '%x' specifier use '%s' and introduce a helper, __ppc_inst_as_str() which returns a char *. The char * __ppc_inst_as_str() returns is buffer that is passed to it by the caller. It is cumbersome to require every caller of __ppc_inst_as_str() to now declare a buffer. To make it more convenient to use __ppc_inst_as_str(), wrap it in a macro that uses a compound statement to allocate a buffer on the caller's stack before calling it. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> [mpe: Drop 0x prefix to match most existings uses, especially xmon] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602052728.18227-1-jniethe5@gmail.com
2020-07-22powerpc/64s: system call support for scv/rfscv instructionsNicholas Piggin
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
2020-07-22powerpc/64s/exception: treat NIA below __end_interrupts as soft-maskedNicholas Piggin
The scv instruction causes an interrupt which can enter the kernel with MSR[EE]=1, thus allowing interrupts to hit at any time. These must not be taken as normal interrupts, because they come from MSR[PR]=0 context, and yet the kernel stack is not yet set up and r13 is not set to the PACA). Treat this as a soft-masked interrupt regardless of the soft masked state. This does not affect behaviour yet, because currently all interrupts are taken with MSR[EE]=0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-2-npiggin@gmail.com
2020-07-22powerpc/perf: Add Power10 PMU feature to DT CPU featuresMadhavan Srinivasan
Add Power10 feature function to DT CPU features, along with a Power10 specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and cpu_features. This will enable performance monitoring unit (PMU) for Power10 in CPU features with "performance-monitor-power10". For Power ISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB feature at boot for Power10. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-8-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22KVM: PPC: Book3S HV: Save/restore new PMU registersAthira Rajeev
Power ISA v3.1 has added new performance monitoring unit (PMU) special purpose registers (SPRs). They are: Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register A (SIER2) Sampled Instruction Event Register B (SIER3) Add support to save/restore these new SPRs while entering/exiting guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3. Add new SPRs to KVM API documentation. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-6-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add support for ISA3.1 PMU SPRsMadhavan Srinivasan
PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCRAthira Rajeev
Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs Split this to give mmcra and mmcrs its own entries in vcpu and use a flat array for mmcr0 to mmcr2. This patch implements this cleanup to make code easier to read. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-3-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/64s: Remove PROT_SAO supportNicholas Piggin
ISA v3.1 does not support the SAO storage control attribute required to implement PROT_SAO. PROT_SAO was used by specialised system software (Lx86) that has been discontinued for about 7 years, and is not thought to be used elsewhere, so removal should not cause problems. We rather remove it than keep support for older processors, because live migrating guest partitions to newer processors may not be possible if SAO is in use (or worse allowed with silent races). - PROT_SAO stays in the uapi header so code using it would still build. - arch_validate_prot() is removed, the generic version rejects PROT_SAO so applications would get a failure at mmap() time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop KVM change for the time being] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com
2020-07-20powerpc/book3s64/kuap: Move UAMOR setup to key init functionAneesh Kumar K.V
UAMOR values are not application-specific. The kernel initializes its value based on different reserved keys. Remove the thread-specific UAMOR value and don't switch the UAMOR on context switch. Move UAMOR initialization to key initialization code and remove thread_struct.uamor because it is not used anymore. Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we used to update uamor based on key allocation and free. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexecAneesh Kumar K.V
As we kexec across kernels that use AMR/IAMR for different purposes we need to ensure that new kernels get kexec'd with a reset value of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old AMR value prevents access to key 0. This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR is not needed and the IAMR reset is partial (it doesn't do the reset on secondary cpus) and is redundant with this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-19-aneesh.kumar@linux.ibm.com