summaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)Author
2017-03-31powerpc/mm/slice: Convert slice_mask high slice to a bitmapAneesh Kumar K.V
In followup patch we want to increase the va range which will result in us requiring high_slices to have more than 64 bits. To enable this convert high_slices to bitmap. We keep the number bits same in this patch and later change that to higher value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fold in fix to use bitmap_empty()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm: Move hash specific pte bits to be top bits of RPNAneesh Kumar K.V
We don't support the full 57 bits of physical address and hence can overload the top bits of RPN as hash specific pte bits. Add a BUILD_BUG_ON() to enforce the relationship between H_PAGE_F_SECOND and H_PAGE_F_GIX. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Move the BUILD_BUG_ON() into hash_utils_64.c and comment it] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/hugetlb: Filter out hugepage size not supported by page table layoutAneesh Kumar K.V
Without this if firmware reports 1MB page size support we will crash trying to use 1MB as hugetlb page size. echo 300 > /sys/kernel/mm/hugepages/hugepages-1024kB/nr_hugepages kernel BUG at ./arch/powerpc/include/asm/hugetlb.h:19! ..... .... [c0000000e2c27b30] c00000000029dae8 .hugetlb_fault+0x638/0xda0 [c0000000e2c27c30] c00000000026fb64 .handle_mm_fault+0x844/0x1d70 [c0000000e2c27d70] c00000000004805c .do_page_fault+0x3dc/0x7c0 [c0000000e2c27e30] c00000000000ac98 handle_page_fault+0x10/0x30 With fix, we don't enable 1MB as hugepage size. bash-4.2# cd /sys/kernel/mm/hugepages/ bash-4.2# ls hugepages-16384kB hugepages-16777216kB Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/radix: rename _PAGE_LARGE to R_PAGE_LARGEAneesh Kumar K.V
This bit is only used by radix and it is nice to follow the naming style of having bit name start with H_/R_ depending on which translation mode they are used. No functional change in this patch. Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/slice: Fix off-by-1 error when computing slice maskAneesh Kumar K.V
For low slice, max addr should be less than 4G. Without limiting this correctly we will end up with a low slice mask which has 17th bit set. This is not a problem with the current code because our low slice mask is of type u16. But in later patch I am switching low slice mask to u64 type and having the 17bit set result in wrong slice mask which in turn results in mmap failures. Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/nohash: MM_SLICE is only used by book3s 64Aneesh Kumar K.V
BOOKE code is dead code as per the Kconfig details. So make it simpler by enabling MM_SLICE only for book3s_64. The changes w.r.t nohash is just removing deadcode. W.r.t ppc64, 4k without hugetlb will now enable MM_SLICE. But that is good, because we reduce one extra variant which probably is not getting tested much. Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-27powerpc/mmu: Add real mode support for IOMMU preregistered memoryAlexey Kardashevskiy
This makes mm_iommu_lookup() able to work in realmode by replacing list_for_each_entry_rcu() (which can do debug stuff which can fail in real mode) with list_for_each_entry_lockless(). This adds realmode version of mm_iommu_ua_to_hpa() which adds explicit vmalloc'd-to-linear address conversion. Unlike mm_iommu_ua_to_hpa(), mm_iommu_ua_to_hpa_rm() can fail. This changes mm_iommu_preregistered() to receive @mm as in real mode @current does not always have a correct pointer. This adds realmode version of mm_iommu_lookup() which receives @mm (for the same reason as for mm_iommu_preregistered()) and uses lockless version of list_for_each_entry_rcu(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21powerpc/32: Remove Mac-on-Linux/rtlinux hooksBen Hutchings
The symbols exported for use by MOL/rtlinux aren't getting CRCs and I was about to fix that. But MOL is dead upstream, and the latest work on it was to make it use KVM instead of its own kernel module. So remove them instead. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21powerpc/mm: Move mmap_sem unlocking in do_page_fault()Laurent Dufour
Since the fault retry is now handled earlier, we can release the mmap_sem lock earlier too and remove later unlocking previously done in mm_fault_error(). Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21powerpc/mm: Handle VM_FAULT_RETRY earlierLaurent Dufour
In do_page_fault() if handle_mm_fault() returns VM_FAULT_RETRY, retry the page fault handling before anything else. This would simplify the handling of the mmap_sem lock in this part of the code. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21powerpc/mm: Move mmap_sem unlock up from do_sigbusLaurent Dufour
Move mmap_sem releasing in the do_sigbus()'s unique caller : mm_fault_error() No functional changes. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21Revert "powerpc/64: Disable use of radix under a hypervisor"Paul Mackerras
This reverts commit 3f91a89d424a79f8082525db5a375e438887bb3e. Now that we do have the machinery for using the radix MMU under a hypervisor, the extra check and comment introduced in 3f91a89d424a are no longer correct. The result is that when booted under a hypervisor that only allows use of radix, we clear the MMU_FTR_TYPE_RADIX and then set it again, and print a warning about ignoring the disable_radix command line option, even though the command line does not include "disable_radix". Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-07Merge tag 'powerpc-4.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Five fairly small fixes for things that went in this cycle. A fairly large patch to rework the CAS logic on Power9, necessitated by a late change to the firmware API, and we can't boot without it. Three fixes going to stable, allowing more instructions to be emulated on LE, fixing a boot crash on 32-bit Freescale BookE machines, and the OPAL XICS workaround. And a patch from me to sort the selects under CONFIG PPC. Annoying churn, but worth it in the long run, and best for it to go in now to avoid conflicts. Thanks to: Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R. Shenoy, Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Sachin Sant, Shile Zhang, Suraj Jitindar Singh" * tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Sort the selects under CONFIG_PPC powerpc/64: Fix L1D cache shape vector reporting L1I values powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info() powerpc: Update to new option-vector-5 format for CAS powerpc: Parse the command line before calling CAS powerpc/xics: Work around limitations of OPAL XICS priority handling powerpc/64: Fix checksum folding in csum_add() powerpc/powernv: Fix opal tracepoints with JUMP_LABEL=n powerpc/booke: Fix boot crash due to null hugepd powerpc: Fix compiling a BE kernel with a powerpc64le toolchain selftest/powerpc: Fix false failures for skipped tests powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop powerpc/64: Invalidate process table caching after setting process table powerpc: emulate_step() tests for load/store instructions powerpc: Emulation support for load/store instructions on LE
2017-03-06powerpc: Update to new option-vector-5 format for CASSuraj Jitindar Singh
On POWER9 the ibm,client-architecture-support (CAS) negotiation process has been updated to change how the host to guest negotiation is done for the new hash/radix mmu as well as the nest mmu, process tables and guest translation shootdown (GTSE). This is documented in the unreleased PAPR ACR "CAS option vector additions for P9". The host tells the guest which options it supports in ibm,arch-vec-5-platform-support. The guest then chooses a subset of these to request in the CAS call and these are agreed to in the ibm,architecture-vec-5 property of the chosen node. Thus we read ibm,arch-vec-5-platform-support and make our selection before calling CAS. We then parse the ibm,architecture-vec-5 property of the chosen node to check whether we should run as hash or radix. ibm,arch-vec-5-platform-support format: index value pairs: <index, val> ... <index, val> index: Option vector 5 byte number val: Some representation of supported values Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Don't print about unknown options, be consistent with OV5_FEAT] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-03powerpc/64: Invalidate process table caching after setting process tablePaul Mackerras
The POWER9 MMU reads and caches entries from the process table. When we kexec from one kernel to another, the second kernel sets its process table pointer but doesn't currently do anything to make the CPU invalidate any cached entries from the old process table. This adds a tlbie (TLB invalidate entry) instruction with parameters to invalidate caching of the process table after the new process table is installed. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-02sched/headers: Prepare to remove the <linux/mm_types.h> dependency from ↵Ingo Molnar
<linux/sched.h> Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving more code ↵Ingo Molnar
to <linux/sched/mm.h> We are going to split more MM APIs out of <linux/sched.h>, which will have to be picked up from a couple of .c files. The APIs that we are going to move are: arch_pick_mmap_layout() arch_get_unmapped_area() arch_get_unmapped_area_topdown() mm_update_next_owner() Include the header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01Merge tag 'powerpc-4.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "Highlights include: - an update of the disassembly code used by xmon to the latest versions in binutils. We've received permission from all the authors of the relevant binutils changes to relicense their changes to the relevant files from GPLv3 to GPLv2, for inclusion in Linux. Thanks to Peter Bergner for doing the leg work to get permission from everyone. - addition of the "architected" Power9 CPU table entry, allowing us to boot in Power9 architected mode under a hypervisor. - updates to the Power9 PMU code. - implementation of clear_bit_unlock_is_negative_byte() to optimise unlock_page(). - Freescale updates from Scott: "Highlights include 8xx breakpoints and perf, t1042rdb display support, and board updates." Thanks to: Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas Miller, Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan, Michael Roth, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Peter Bergner, Paul E. McKenney, Rashmica Gupta, Russell Currey, Sahil Mehta, Stewart Smith" * tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits) powerpc: Remove leftover cputime_to_nsecs call causing build error powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU powerpc/optprobes: Fix TOC handling in optprobes trampoline powerpc/pseries: Advertise Hot Plug Event support to firmware cxl: fix nested locking hang during EEH hotplug powerpc/xmon: Dump memory in CPU endian format powerpc/pseries: Revert 'Auto-online hotplugged memory' powerpc/powernv: Make PCI non-optional powerpc/64: Implement clear_bit_unlock_is_negative_byte() powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable() powerpc/kernel: Remove error message in pcibios_setup_phb_resources() powerpc/mm: Fix typo in set_pte_at() pci/hotplug/pnv-php: Disable MSI and PCI device properly pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot() powerpc: Add POWER9 architected mode to cputable powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags() powerpc/perf: Avoid FAB_*_MATCH checks for power9 powerpc/perf: Add restrictions to PMC5 in power9 DD1 powerpc/perf: Use Instruction Counter value ...
2017-02-22Merge tag 'powerpc-4.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Support for direct mapped LPC on POWER9, giving Linux direct access to devices that may be on there such as a UART. - Memory hotplug support for the Power9 Radix MMU. - Add new AUX vectors describing the processor's cache geometry, to be used by glibc. - The ability for a guest to ask the hypervisor to resize the guest's hash table, and in addition support for doing so automatically when memory is hotplugged into/out-of the guest. This allows the hash table to be sized based on the current memory usage of the guest, rather than the maximum possible memory usage. - Implementation of optprobes (kprobe optimisation) for powerpc. In addition there's the topic branch shared with the KVM tree, which includes support for guests to use the Radix MMU on Power9. Thanks to: Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton Blanchard, Benjamin Herrenschmidt, Chris Packham, Daniel Axtens, Daniel Borkmann, David Gibson, Finn Thain, Gautham R. Shenoy, Gavin Shan, Greg Kurz, Joel Stanley, John Allen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Reza Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun" * tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (129 commits) powerpc/mm/radix: Skip ptesync in pte update helpers powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm powerpc/mm/radix: Update pte update sequence for pte clear case powerpc/mm: Update PROTFAULT handling in the page fault path powerpc/xmon: Fix data-breakpoint powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n powerpc/pseries: Fix typo in parameter description powerpc/kprobes: Remove kprobe_exceptions_notify() kprobes: Introduce weak variant of kprobe_exceptions_notify() powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL powerpc/powernv: Fix opal_exit tracepoint opcode powerpc: Add a prototype for mcount() so it can be versioned powerpc: Drop GPL from of_node_to_nid() export to match other arches powerpc/kprobes: Optimize kprobe in kretprobe_trampoline() powerpc/kprobes: Implement Optprobes powerpc/kprobes: Fixes for kprobe_lookup_name() on BE powerpc: Add helper to check if offset is within relative branch range powerpc/bpf: Introduce __PPC_SH64() ...
2017-02-17powerpc/mm: Fix typo in set_pte_at()Gavin Shan
This fixes the typo about the _PAGE_PTE in set_pte_at() by changing "tryint" to "trying to". Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-17powerpc/mm: Blacklist SLB symbols from kprobeMichael Ellerman
We can't sensibly take a trap at this point. So, blacklist these symbols. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-17powerpc/mm: Convert slb_finish_load[_1T] to local symbolsMichael Ellerman
slb_finish_load and slb_finish_load_1T are both only used within slb_low.S, so make them local symbols. This makes the code a little clearer, as it's more obvious neither is intended to be an entry point from arbitrary other code, only the uses in this file. It also prevents them being used with kprobes and other tracing tools, which is good because we're not able to safely take traps at these locations, so making them local symbols avoids us needing to blacklist them. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-16powerpc/64: Disable use of radix under a hypervisorPaul Mackerras
Currently, if the kernel is running on a POWER9 processor under a hypervisor, it may try to use the radix MMU even though it doesn't have the necessary code to do so (it doesn't negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). If the hypervisor supports both radix and HPT, then it will set up the guest to use HPT (since the guest doesn't request radix in the CAS call), but if the radix feature bit is set in the ibm,pa-features property (which is valid, since ibm,pa-features is defined to represent the capabilities of the processor) the guest will try to use radix, resulting in a crash when it turns the MMU on. This makes the minimal fix for the current code, which is to disable radix unless we are running in hypervisor mode. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15powerpc/mm: Update PROTFAULT handling in the page fault pathAneesh Kumar K.V
With radix, we can get page fault with DSISR_PROTFAULT value set in case of PROT_NONE or autonuma mapping. The PROT_NONE case in handled by the vma check where we consider the access bad. For autonuma we should fall through and fixup the access mask correctly. Without this patch we trigger the WARN_ON() on radix. This code moves that WARN_ON() within a radix_enabled() check. I also moved the WARN_ON() outside the if condition making it apply for all type of faults (exec/write/read). It is also conditionalized for book3s, because BOOK3E can also get a PROTFAULT to handle the D/I cache sync. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-14Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge the topic branch we're sharing with the kvm-ppc tree.
2017-02-14powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=yMichael Ellerman
Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y: arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’: arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function) if (get_pageblock_migratetype(page) == MIGRATE_CMA) { ^~~~~~~~~~~ Fix it by using the existing is_migrate_cma_page(), which evaulates to false when CMA=n. Fixes: 2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10powerpc: Drop GPL from of_node_to_nid() export to match other archesShailendra Singh
The generic implementation of of_node_to_nid() is EXPORT_SYMBOL, added in commit 298535c00a2c ("of, numa: Add NUMA of binding implementation."). The powerpc implementation added in commit 953039c8df7b ("[PATCH] powerpc: Allow devices to register with numa topology") is EXPORT_SYMBOL_GPL. This creates an inconsistency for of_node_to_nid() callers across architectures. Update the powerpc implementation to be exported consistently with the generic implementation. Signed-off-by: Shailendra Singh <shailendras@nvidia.com> Reviewed-by: Andy Ritger <aritger@nvidia.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10powerpc/pseries: Automatically resize HPT for memory hot add/removeDavid Gibson
We've now implemented code in the pseries platform to use the new PAPR interface to allow resizing the hash page table (HPT) at runtime. This patch uses that interface to automatically attempt to resize the HPT when memory is hot added or removed. This tries to always keep the HPT at a reasonable size for our current memory size. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10powerpc/pseries: Add support for hash table resizingDavid Gibson
This adds support for using two hypercalls to change the size of the main hash page table while running as a PAPR guest. For now these hypercalls are only in experimental qemu versions. The interface is two part: first H_RESIZE_HPT_PREPARE is used to allocate and prepare the new hash table. This may be slow, but can be done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the new hash table. This requires that no CPUs be concurrently updating the HPT, and so must be run under stop_machine(). This also adds a debugfs file which can be used to manually control HPT resizing or testing purposes. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paul Mackerras <paulus@samba.org> [mpe: Rename the debugfs file to "hpt_order"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-09powerpc/mm/radix: Update ERAT flushes when invalidating TLBBenjamin Herrenschmidt
Three tiny changes to the ERAT flushing logic: First don't make it depend on DD1. It hasn't been decided yet but we might run DD2 in a mode that also requires explicit flushes for performance reasons so make it unconditional. We also add a missing isync, and finally remove the flush from _tlbiel_va as it is only necessary for congruence-class invalidations (PID, LPID and full TLB), not targetted invalidations. Fixes: 96ed1fe511a8 ("powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-08powerpc/mm: Fix spurrious segfaults on radix with autonumaBenjamin Herrenschmidt
When autonuma (Automatic NUMA balancing) marks a PTE inaccessible it clears all the protection bits but leave the PTE valid. With the Radix MMU, an attempt at executing from such a PTE will take a fault with bit 35 of SRR1 set "SRR1_ISI_N_OR_G". It is thus incorrect to treat all such faults as errors. We should pass them to handle_mm_fault() for autonuma to deal with. The case of pages that are really not executable is handled by the existing test for VM_EXEC further down. That leaves us with catching the kernel attempts at executing user pages. We can catch that earlier, even before we do find_vma. It is never valid on powerpc for the kernel to take an exec fault to begin with. So fold that test with the existing test for the kernel faulting on kernel addresses to bail out early. Fixes: 1d18ad026844 ("powerpc/mm: Detect instruction fetch denied and report") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Make type of partition table flush depend on partition typePaul Mackerras
When changing a partition table entry on POWER9, we do a particular form of the tlbie instruction which flushes all TLBs and caches of the partition table for a given logical partition ID (LPID). This instruction has a field in the instruction word, labelled R (radix), which should be 1 if the partition was previously a radix partition and 0 if it was a HPT partition. This implements that logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Export pgtable_cache and pgtable_cache_add for KVMPaul Mackerras
This exports the pgtable_cache array and the pgtable_cache_add function so that HV KVM can use them for allocating radix page tables for guests. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Enable use of radix MMU under hypervisor on POWER9Paul Mackerras
To use radix as a guest, we first need to tell the hypervisor via the ibm,client-architecture call first that we support POWER9 and architecture v3.00, and that we can do either radix or hash and that we would like to choose later using an hcall (the H_REGISTER_PROC_TBL hcall). Then we need to check whether the hypervisor agreed to us using radix. We need to do this very early on in the kernel boot process before any of the MMU initialization is done. If the hypervisor doesn't agree, we can't use radix and therefore clear the radix MMU feature bit. Later, when we have set up our process table, which points to the radix tree for each process, we need to install that using the H_REGISTER_PROC_TBL hcall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/64: Don't try to use radix MMU under a hypervisorPaul Mackerras
Currently, if the kernel is running on a POWER9 processor under a hypervisor, it will try to use the radix MMU even though it doesn't have the necessary code to use radix under a hypervisor (it doesn't negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The result is that the guest kernel will crash when it tries to turn on the MMU. This fixes it by looking for the /chosen/ibm,architecture-vec-5 property, and if it exists, clears the radix MMU feature bit, before we decide whether to initialize for radix or HPT. This property is created by the hypervisor as a result of the guest calling the ibm,client-architecture-support method to indicate its capabilities, so it will indicate whether the hypervisor agreed to us using radix. Systems without a hypervisor may have this property also (for example, skiboot creates it), so we check the HV bit in the MSR to see whether we are running as a guest or not. If we are in hypervisor mode, then we can do whatever we like including using the radix MMU. The reason for using this property is that in future, when we have support for using radix under a hypervisor, we will need to check this property to see whether the hypervisor agreed to us using radix. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: unstub radix__vmemmap_remove_mapping()Reza Arbab
Use remove_pagetable() and friends for radix vmemmap removal. We do not require the special-case handling of vmemmap done in the x86 versions of these functions. This is because vmemmap_free() has already freed the mapped pages, and calls us with an aligned address range. So, add a few failsafe WARNs, but otherwise the code to remove physical mappings is already sufficient for vmemmap. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: add radix__remove_section_mapping()Reza Arbab
Tear down and free the four-level page tables of physical mappings during memory hotremove. Borrow the basic structure of remove_pagetable() and friends from the identically-named x86 functions. Reduce the frequency of tlb flushes and page_table_lock spinlocks by only doing them in the outermost function. There was some question as to whether the locking is needed at all. Leave it for now, but we could consider dropping it. Memory must be offline to be removed, thus not in use. So there shouldn't be the sort of concurrent page walking activity here that might prompt us to use RCU. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: add radix__create_section_mapping()Reza Arbab
Wire up memory hotplug page mapping for radix. Share the mapping function already used by radix_init_pgtable(). Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powerpc/mm: refactor radix physical page mappingReza Arbab
Move the page mapping code in radix_init_pgtable() into a separate function that will also be used for memory hotplug. The current goto loop progressively decreases its mapping size as it covers the tail of a range whose end is unaligned. Change this to a for loop which can do the same for both ends of the range. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/powernv: Initialise nest mmuAlistair Popple
POWER9 contains an off core mmu called the nest mmu (NMMU). This is used by other hardware units on the chip to translate virtual addresses into real addresses. The unit attempting an address translation provides the majority of the context required for the translation request except for the base address of the partition table (ie. the PTCR) which needs to be programmed into the NMMU. This patch adds a call to OPAL to set the PTCR for the nest mmu in opal_init(). Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/mm: Allow memory hotplug into an offline nodeReza Arbab
Relax the check preventing us from hotplugging into an offline node. This limitation was added in commit 482ec7c403d2 ("[PATCH] powerpc numa: Support sparse online node map") to prevent adding resources to an uninitialized node. These days, there is no harm in doing so. The addition will actually cause the node to be initialized and onlined; add_memory_resource() calls hotadd_new_pgdat() (if necessary) and node_set_online(). Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/mm: Simplify loop control in parse_numa_properties()Reza Arbab
The flow of the main loop in parse_numa_properties() is overly complicated. Simplify it to be less confusing and easier to read. No functional change. The end of the main loop in parse_numa_properties() looks like this: for_each_node_by_type(...) { ... if (!condition) { if (--ranges) goto new_range; else continue; } statement(); if (--ranges) goto new_range; /* else * continue; <- implicit, this is the end of the loop */ } The only effect of !condition is to skip execution of statement(). This can be rewritten in a simpler way: for_each_node_by_type(...) { ... if (condition) statement(); if (--ranges) goto new_range; } Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/mm: Use the correct pointer when setting a 2MB pteReza Arbab
When setting a 2MB pte, radix__map_kernel_page() is using the address ptep = (pte_t *)pudp; Fix this conversion to use pmdp instead. Use pmdp_ptep() to do this instead of casting the pointer. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25powerpc/mm: Return directly after a failed __copy_from_user() in ↵Markus Elfring
sys_subpage_prot() This function already has multiple exit points, so there's no harm adding another. Although it looks odd to return directly in a function which takes a lock, we've actually just dropped the mmap_sem in this code, so there's really no reason to go via a label. And it means we can drop the unhelpfully named out2 label. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-23powerpc/mm: Remove the debug hugepd_ok checkAneesh Kumar K.V
We don't do this for other page table entries. So lets keep this simple and always return false for hugepd check on a 64K page size config. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-18powerpc/mm: Fix little-endian 4K hugetlbAneesh Kumar K.V
When we switched to big endian page table, we never updated the hugepd format such that it can work for both big endian and little endian config. This patch series update hugepd format such that it is looked at as __be64 value in big endian page table config. This patch also switch hugepd_t.pd from signed long to unsigned long. I did update the FSL hugepd_ok check to check for the top bit instead of checking > 0. Fixes: 5dc1ef858c12 ("powerpc/mm: Use big endian Linux page tables for book3s 64") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-18powerpc/mm/hugetlb: Don't panic when we don't find the default huge page sizeAneesh Kumar K.V
The generic hugetlbfs code can handle not finding the default huge page size correctly. With HPAGE_SHIFT = 0 we see in dmesg: hugetlbfs: disabling because there are no supported hugepage sizes bash-4.2# echo 30 > /proc/sys/vm/nr_hugepages bash: echo: write error: Operation not supported Fixes: 03bb2d65900c ("powerpc: get hugetlbpage handling more generic") Reported-by: Chris Smart <chris@distroguy.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-18powerpc: Fix pgtable pmd cache initNicholas Piggin
Commit 9b081e10805cd ("powerpc: port 64 bits pgtable_cache to 32 bits") mixed up PMD_INDEX_SIZE and PMD_CACHE_INDEX a couple of times. This resulted in 64s/hash/4k configs to panic at boot with a false positive error check. Fix that and simplify error handling by moving the check to the caller. Fixes: 9b081e10805cd ("powerpc: port 64 bits pgtable_cache to 32 bits") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>