summaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm/book3s/64/pgtable.h
AgeCommit message (Collapse)Author
2019-08-27powerpc/mm: refactor ioremap vm area setup.Christophe Leroy
PPC32 and PPC64 are doing the same once SLAB is available. Create a do_ioremap() function that calls get_vm_area and do the mapping. For PPC64, we add the 4K PFN hack sanity check to __ioremap_caller() in order to avoid using __ioremap_at(). Other checks in __ioremap_at() are irrelevant for __ioremap_caller(). On PPC64, VM area is allocated in the range [ioremap_bot ; IOREMAP_END] On PPC32, VM area is allocated in the range [VMALLOC_START ; VMALLOC_END] Lets define IOREMAP_START is ioremap_bot for PPC64, and alias IOREMAP_START/END to VMALLOC_START/END on PPC32 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/42e7e36ad32e0fdf76692426cc642799c9f689b8.1566309263.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: make ioremap_bot common to allChristophe Leroy
Drop multiple definitions of ioremap_bot and make one common to all subarches. Only CONFIG_PPC_BOOK3E_64 had a global static init value for ioremap_bot. Now ioremap_bot is set in early_init_mmu_global(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/920eebfd9f36f14c79d1755847f5bf7c83703bdd.1566309262.git.christophe.leroy@c-s.fr
2019-08-20powerpc/64s/radix: Remove redundant pfn_pte bitop, add VM_BUG_ONNicholas Piggin
pfn_pte is never given a pte above the addressable physical memory limit, so the masking is redundant. In case of a software bug, it is not obviously better to silently truncate the pfn than to corrupt the pte (either one will result in memory corruption or crashes), so there is no reason to add this to the fast path. Add VM_BUG_ON to catch cases where the pfn is invalid. These would catch the create_section_mapping bug fixed by a previous commit. [16885.256466] ------------[ cut here ]------------ [16885.256492] kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! cpu 0x0: Vector: 700 (Program Check) at [c0000000ee0a36d0] pc: c000000000080738: __map_kernel_page+0x248/0x6f0 lr: c000000000080ac0: __map_kernel_page+0x5d0/0x6f0 sp: c0000000ee0a3960 msr: 9000000000029033 current = 0xc0000000ec63b400 paca = 0xc0000000017f0000 irqmask: 0x03 irq_happened: 0x01 pid = 85, comm = sh kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Linux version 5.3.0-rc1-00001-g0fe93e5f3394 enter ? for help [c0000000ee0a3a00] c000000000d37378 create_physical_mapping+0x260/0x360 [c0000000ee0a3b10] c000000000d370bc create_section_mapping+0x1c/0x3c [c0000000ee0a3b30] c000000000071f54 arch_add_memory+0x74/0x130 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-5-npiggin@gmail.com
2019-07-16mm: introduce ARCH_HAS_PTE_DEVMAPRobin Murphy
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined with the long-out-of-date comment can lead to the impression than an architecture may just enable it (since __add_pages() now "comprehends device memory" for itself) and expect things to work. In practice, however, ZONE_DEVICE users have little chance of functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper dependency so the real situation is clearer. Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05powerpc/book3s: Use config independent helpers for page table walkAneesh Kumar K.V
Even when we have HugeTLB and THP disabled, kernel linear map can still be mapped with hugepages. This is only an issue with radix translation because hash MMU doesn't map kernel linear range in linux page table and other kernel map areas are not mapped using hugepage. Add config independent helpers and put WARN_ON() when we don't expect things to be mapped via hugepages. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/mm: Remove unused variable declarationAneesh Kumar K.V
Since commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") __kernel_virt_size is not used anymore. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch into next, this brings in a number of commits that fix bugs we don't want to hit in next, in particular the fix for CVE-2019-12817.
2019-06-19powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAPNicholas Piggin
This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required page table functions. This enables huge (2MB and 1GB) ioremap mappings. I don't have a benchmark for this change, but huge vmap will be used by a later core kernel change to enable huge vmalloc memory mappings. This improves cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB size dentry cache hash. Profiling git diff dTLB misses with a vanilla kernel: 81.75% git [kernel.vmlinux] [k] __d_lookup_rcu 7.21% git [kernel.vmlinux] [k] strncpy_from_user 1.77% git [kernel.vmlinux] [k] find_get_entry 1.59% git [kernel.vmlinux] [k] kmem_cache_free 40,168 dTLB-miss 0.100342754 seconds time elapsed With powerpc huge vmalloc: 2,987 dTLB-miss 0.095933138 seconds time elapsed Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-07powerpc/64s: Fix THP PMD collapse serialisationNicholas Piggin
Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") changed the actual bitwise tests in pte_access_permitted by using pte_write() and pte_present() helpers rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits. The pte_present() change now returns true for PTEs which are !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by pmdp_invalidate() to synchronize access from lock-free lookups. pte_access_permitted() is used by pmd_access_permitted(), so allowing GUP lock free access to proceed with such PTEs breaks this synchronisation. This bug has been observed on a host using the hash page table MMU, with random crashes and corruption in guests, usually together with bad PMD messages in the host. Fix this by adding an explicit check in pmd_access_permitted(), and documenting the condition explicitly. The pte_write() change should be okay, and would prevent GUP from falling back to the slow path when encountering savedwrite PTEs, which matches what x86 (that does not implement savedwrite) does. Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-21powerpc/mm/hash64: Map all the kernel regions in the same 0xc rangeAneesh Kumar K.V
This patch maps vmalloc, IO and vmemap regions in the 0xc address range instead of the current 0xd and 0xf range. This brings the mapping closer to radix translation mode. With hash 64K page size each of this region is 512TB whereas with 4K config we are limited by the max page table range of 64TB and hence there regions are of 16TB size. The kernel mapping is now: On 4K hash kernel_region_map_size = 16TB kernel vmalloc start = 0xc000100000000000 kernel IO start = 0xc000200000000000 kernel vmemmap start = 0xc000300000000000 64K hash, 64K radix and 4k radix: kernel_region_map_size = 512TB kernel vmalloc start = 0xc008000000000000 kernel IO start = 0xc00a000000000000 kernel vmemmap start = 0xc00c000000000000 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-21powerpc/mm/hash64: Add a variable to track the end of IO mappingAneesh Kumar K.V
This makes it easy to update the region mapping in the later patch Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-20powerpc/mm: Silence unused-but-set-variable warningsQian Cai
pte_unmap() compiles away on some powerpc platforms, so silence the warnings below by making it a static inline function. mm/memory.c: In function 'copy_pte_range': mm/memory.c:820:24: warning: variable 'orig_dst_pte' set but not used mm/memory.c:820:9: warning: variable 'orig_src_pte' set but not used mm/madvise.c: In function 'madvise_free_pte_range': mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used mm/swap_state.c: In function 'swap_ra_info': mm/swap_state.c:634:15: warning: variable 'orig_pte' set but not used Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-07Merge tag 'powerpc-5.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Enable THREAD_INFO_IN_TASK to move thread_info off the stack. - A big series from Christoph reworking our DMA code to use more of the generic infrastructure, as he said: "This series switches the powerpc port to use the generic swiotlb and noncoherent dma ops, and to use more generic code for the coherent direct mapping, as well as removing a lot of dead code." - Increase our vmalloc space to 512T with the Hash MMU on modern CPUs, allowing us to support machines with larger amounts of total RAM or distance between nodes. - Two series from Christophe, one to optimise TLB miss handlers on 6xx, and another to optimise the way STRICT_KERNEL_RWX is implemented on some 32-bit CPUs. - Support for KCOV coverage instrumentation which means we can run syzkaller and discover even more bugs in our code. And as always many clean-ups, reworks and minor fixes etc. Thanks to: Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea Arcangeli, Andrew Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir Singh, Brajeswar Ghosh, Breno Leitao, Christian Lamparter, Christian Zigotzky, Christophe Leroy, Christoph Hellwig, Corentin Labbe, Daniel Axtens, David Gibson, Diana Craciun, Firoz Khan, Gustavo A. R. Silva, Igor Stoppa, Joe Lawrence, Joel Stanley, Jonathan Neuschäfer, Jordan Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce, Meelis Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot, Nicholas Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran, Paul Mackerras, Peter Xu, PrasannaKumar Muralidharan, Qian Cai, Rashmica Gupta, Reza Arbab, Robert P. J. Day, Russell Currey, Sabyasachi Gupta, Sam Bobroff, Sandipan Das, Sergey Senozhatsky, Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav Jain, YueHaibing" * tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits) powerpc/32: Clear on-stack exception marker upon exception return powerpc: Remove export of save_stack_trace_tsk_reliable() powerpc/mm: fix "section_base" set but not used powerpc/mm: Fix "sz" set but not used warning powerpc/mm: Check secondary hash page table powerpc: remove nargs from __SYSCALL powerpc/64s: Fix unrelocated interrupt trampoline address test powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables powerpc/fsl: Fix the flush of branch predictor. powerpc/powernv: Make opal log only readable by root powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C powerpc/64s: Fix data interrupts vs d-side MCE reentrancy powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy powerpc/64s: system reset interrupt preserve HSRRs powerpc/64s: Fix HV NMI vs HV interrupt recoverability test powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback selftests/powerpc: Remove duplicate header powerpc sstep: Add support for modsd, modud instructions ...
2019-03-05arch/powerpc/mm: Nest MMU workaround for mprotect RW upgradeAneesh Kumar K.V
NestMMU requires us to mark the pte invalid and flush the tlb when we do a RW upgrade of pte. We fixed a variant of this in the fault path in bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang"). Do the same for mprotect upgrades. Hugetlb is handled in the next patch. Link: http://lkml.kernel.org/r/20190116085035.29729-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-22powerpc/book3s: Remove pgd/pud/pmd_set() interfacesAneesh Kumar K.V
When updating page tables, we need to make sure we fill the page table entry valid bits. We do this by or'ing in one of PGD/PUD/PMD_VAL_BITS. The page table 'set' interfaces allow updating the raw value of page table entries without setting the valid bits, so remove those interfaces to avoid incorrect usage in future. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Reword commit message based on mailing list discussion] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-19Merge branch 'fixes' into nextMichael Ellerman
There's a few important fixes in our fixes branch, in particular the pgd/pud_present() one, so merge it now.
2019-02-17powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()Michael Ellerman
In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT rather than just checking that the value is non-zero, e.g.: static inline int pgd_present(pgd_t pgd) { - return !pgd_none(pgd); + return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT)); } Unfortunately this is broken on big endian, as the result of the bitwise & is truncated to int, which is always zero because _PAGE_PRESENT is 0x8000000000000000ul. This means pgd_present() and pud_present() are always false at compile time, and the compiler elides the subsequent code. Remarkably with that bug present we are still able to boot and run with few noticeable effects. However under some work loads we are able to trigger a warning in the ext4 code: WARNING: CPU: 11 PID: 29593 at fs/ext4/inode.c:3927 .ext4_set_page_dirty+0x70/0xb0 CPU: 11 PID: 29593 Comm: debugedit Not tainted 4.20.0-rc1 #1 ... NIP .ext4_set_page_dirty+0x70/0xb0 LR .set_page_dirty+0xa0/0x150 Call Trace: .set_page_dirty+0xa0/0x150 .unmap_page_range+0xbf0/0xe10 .unmap_vmas+0x84/0x130 .unmap_region+0xe8/0x190 .__do_munmap+0x2f0/0x510 .__vm_munmap+0x80/0x110 .__se_sys_munmap+0x14/0x30 system_call+0x5c/0x70 The fix is simple, we need to convert the result of the bitwise & to an int before returning it. Thanks to Erhard, Jan Kara and Aneesh for help with debugging. Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Erhard F. <erhard_f@mailbox.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-31powerpc/radix: Fix kernel crash with mremap()Aneesh Kumar K.V
With support for split pmd lock, we use pmd page pmd_huge_pte pointer to store the deposited page table. In those config when we move page tables we need to make sure we move the deposited page table to the correct pmd page. Otherwise this can result in crash when we withdraw of deposited page table because we can find the pmd_huge_pte NULL. eg: __split_huge_pmd+0x1070/0x1940 __split_huge_pmd+0xe34/0x1940 (unreliable) vma_adjust_trans_huge+0x110/0x1c0 __vma_adjust+0x2b4/0x9b0 __split_vma+0x1b8/0x280 __do_munmap+0x13c/0x550 sys_mremap+0x220/0x7e0 system_call+0x5c/0x70 Fixes: 675d995297d4 ("powerpc/book3s64: Enable split pmd ptlock.") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-31powerpc/mm: Add _PAGE_SAO to _PAGE_CACHE_CTL maskReza Arbab
In htab_convert_pte_flags(), _PAGE_CACHE_CTL is used to check for the _PAGE_SAO flag: else if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_SAO) rflags |= (HPTE_R_W | HPTE_R_I | HPTE_R_M); But, it isn't defined to include that flag: #define _PAGE_CACHE_CTL (_PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT) This happens to work, but only because of the flag values: #define _PAGE_SAO 0x00010 /* Strong access order */ #define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */ #define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */ To prevent any issues if these particulars ever change, add _PAGE_SAO to the mask. Suggested-by: Charles Johns <crjohns@us.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-25powerpc/mm: remove const type qualifier from function ‘pud_pfn’Mathieu Malaterre
Type qualifier on return type is ignored. Remove warning in W=1: arch/powerpc/include/asm/book3s/64/pgtable.h:1268:25: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-28Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds
Pull XArray conversion from Matthew Wilcox: "The XArray provides an improved interface to the radix tree data structure, providing locking as part of the API, specifying GFP flags at allocation time, eliminating preloading, less re-walking the tree, more efficient iterations and not exposing RCU-protected pointers to its users. This patch set 1. Introduces the XArray implementation 2. Converts the pagecache to use it 3. Converts memremap to use it The page cache is the most complex and important user of the radix tree, so converting it was most important. Converting the memremap code removes the only other user of the multiorder code, which allows us to remove the radix tree code that supported it. I have 40+ followup patches to convert many other users of the radix tree over to the XArray, but I'd like to get this part in first. The other conversions haven't been in linux-next and aren't suitable for applying yet, but you can see them in the xarray-conv branch if you're interested" * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits) radix tree: Remove multiorder support radix tree test: Convert multiorder tests to XArray radix tree tests: Convert item_delete_rcu to XArray radix tree tests: Convert item_kill_tree to XArray radix tree tests: Move item_insert_order radix tree test suite: Remove multiorder benchmarking radix tree test suite: Remove __item_insert memremap: Convert to XArray xarray: Add range store functionality xarray: Move multiorder_check to in-kernel tests xarray: Move multiorder_shrink to kernel tests xarray: Move multiorder account test in-kernel radix tree test suite: Convert iteration test to XArray radix tree test suite: Convert tag_tagged_items to XArray radix tree: Remove radix_tree_clear_tags radix tree: Remove radix_tree_maybe_preload_order radix tree: Remove split/join code radix tree: Remove radix_tree_update_node_t page cache: Finish XArray conversion dax: Convert page fault handlers to XArray ...
2018-10-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) hugetlbfs: dirty pages as they are added to pagecache mm: export add_swap_extent() mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t mm/gup: cache dev_pagemap while pinning pages Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" mm: return zero_resv_unavail optimization mm: zero remaining unavailable struct pages tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option tools/testing/selftests/vm/gup_benchmark.c: allow user specified file tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage mm/gup_benchmark.c: add additional pinning methods mm/gup_benchmark.c: time put_page() mm: don't raise MEMCG_OOM event due to failed high-order allocation mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock ...
2018-10-26hugetlb: introduce generic version of huge_ptep_set_wrprotect()Alexandre Ghiti
arm, ia64, mips, powerpc, sh, x86 architectures use the same version of huge_ptep_set_wrprotect, so move this generic implementation into asm-generic/hugetlb.h. Link: http://lkml.kernel.org/r/20180920060358.16606-10-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Helge Deller <deller@gmx.de> [parisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Ingo Molnar <mingo@kernel.org> [x86] Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: James Hogan <jhogan@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26Merge tag 'powerpc-4.20-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - A large series to rewrite our SLB miss handling, replacing a lot of fairly complicated asm with much fewer lines of C. - Following on from that, we now maintain a cache of SLB entries for each process and preload them on context switch. Leading to a 27% speedup for our context switch benchmark on Power9. - Improvements to our handling of SLB multi-hit errors. We now print more debug information when they occur, and try to continue running by flushing the SLB and reloading, rather than treating them as fatal. - Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9). - Add support for physical memory up to 2PB in the linear mapping on 64-bit Book3S. We only support up to 512TB as regular system memory, otherwise the percpu allocator runs out of vmalloc space. - Add stack protector support for 32 and 64-bit, with a per-task canary. - Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP. - Support recognising "big cores" on Power9, where two SMT4 cores are presented to us as a single SMT8 core. - A large series to cleanup some of our ioremap handling and PTE flags. - Add a driver for the PAPR SCM (storage class memory) interface, allowing guests to operate on SCM devices (acked by Dan). - Changes to our ftrace code to handle very large kernels, where we need to use a trampoline to get to ftrace_caller(). And many other smaller enhancements and cleanups. Thanks to: Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton Blanchard, Aravinda Prasad, Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Breno Leitao, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Dan Carpenter, Daniel Axtens, Finn Thain, Gautham R. Shenoy, Gustavo Romero, Haren Myneni, Hari Bathini, Jia Hongtao, Joel Stanley, John Allen, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael Bringmann, Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab, Rob Herring, Sam Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan Johnson, Stephen Rothwell, Stewart Smith, Suraj Jitindar Singh, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, YueHaibing, zhong jiang" * tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (221 commits) Revert "selftests/powerpc: Fix out-of-tree build errors" powerpc/msi: Fix compile error on mpc83xx powerpc: Fix stack protector crashes on CPU hotplug powerpc/traps: restore recoverability of machine_check interrupts powerpc/64/module: REL32 relocation range check powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd selftests/powerpc: Add a test of wild bctr powerpc/mm: Fix page table dump to work on Radix powerpc/mm/radix: Display if mappings are exec or not powerpc/mm/radix: Simplify split mapping logic powerpc/mm/radix: Remove the retry in the split mapping logic powerpc/mm/radix: Fix small page at boundary when splitting powerpc/mm/radix: Fix overuse of small pages in splitting logic powerpc/mm/radix: Fix off-by-one in split mapping logic powerpc/ftrace: Handle large kernel configs powerpc/mm: Fix WARN_ON with THP NUMA migration selftests/powerpc: Fix out-of-tree build errors powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 powerpc/time: isolate scaled cputime accounting in dedicated functions. ...
2018-10-19powerpc/mm: Make pte_pgprot return all pte bitsAneesh Kumar K.V
Other archs do the same and instead of adding required pte bits (which got masked out) in __ioremap_at(), make sure we filter only pfn bits out. Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code") Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/book3s64: Avoid multiple endian conversion in pte helpersChristophe Leroy
In the same spirit as already done in pte query helpers, this patch changes pte setting helpers to perform endian conversions on the constants rather than on the pte value. In the meantime, it changes pte_access_permitted() to use pte helpers for the same reason. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: move __P and __S tables in the common pgtable.hChristophe Leroy
__P and __S flags are the same for all platform and should remain as is in the future, so avoid duplication. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: drop unused page flagsChristophe Leroy
The following page flags in pte-common.h can be dropped: _PAGE_ENDIAN is only used in mm/fsl_booke_mmu.c and is defined in asm/nohash/32/pte-fsl-booke.h _PAGE_4K_PFN is nowhere defined nor used _PAGE_READ, _PAGE_WRITE and _PAGE_PTE are only defined and used in book3s/64 The following page flags in book3s/64/pgtable.h can be dropped as they are not used on this platform nor by common code. _PAGE_NA, _PAGE_RO, _PAGE_USER and _PAGE_PSIZE Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: add pte helpers to query and change pte flagsChristophe Leroy
In order to avoid using generic _PAGE_XXX flags in powerpc core functions, define helpers for all needed flags: - pte_mkuser() and pte_mkprivileged() to set/unset and/or unset/set _PAGE_USER and/or _PAGE_PRIVILEGED - pte_hashpte() to check if _PAGE_HASHPTE is set. - pte_ci() check if cache is inhibited (already existing on book3s/64) - pte_exprotect() to protect against execution - pte_exec() and pte_mkexec() to query and set page execution - pte_mkpte() to set _PAGE_PTE flag. - pte_hw_valid() to check _PAGE_PRESENT since pte_present does something different on book3s/64. On book3s/32 there is no exec protection, so pte_mkexec() and pte_exprotect() are nops and pte_exec() returns always true. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: handover page flags with a pgprot_t parameterChristophe Leroy
In order to avoid multiple conversions, handover directly a pgprot_t to map_kernel_page() as already done for radix. Do the same for __ioremap_caller() and __ioremap_at(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09mm: Preserve _PAGE_DEVMAP across mprotect() callsJan Kara
Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a result we will see warnings such as: BUG: Bad page map in process JobWrk0013 pte:800001803875ea25 pmd:7624381067 addr:00007f0930720000 vm_flags:280000f9 anon_vma: (null) mapping:ffff97f2384056f0 index:0 file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage: (null) CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G W 4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased) Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 Call Trace: dump_stack+0x5a/0x75 print_bad_pte+0x217/0x2c0 ? enqueue_task_fair+0x76/0x9f0 _vm_normal_page+0xe5/0x100 zap_pte_range+0x148/0x740 unmap_page_range+0x39a/0x4b0 unmap_vmas+0x42/0x90 unmap_region+0x99/0xf0 ? vma_gap_callbacks_rotate+0x1a/0x20 do_munmap+0x255/0x3a0 vm_munmap+0x54/0x80 SyS_munmap+0x1d/0x30 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ... when mprotect(2) gets used on DAX mappings. Also there is a wide variety of other failures that can result from the missing _PAGE_DEVMAP flag when the area gets used by get_user_pages() later. Fix the problem by including _PAGE_DEVMAP in a set of flags that get preserved by mprotect(2). Fixes: 69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP") Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64") Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-03powerpc/mm:book3s: Enable THP migration supportAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/mm/thp: update pmd_trans_huge to check for pmd_presentAneesh Kumar K.V
We need to make sure pmd_trans_huge returns false for a pmd migration entry. We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the _PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make sure we check for pmd_present() so that pmd_trans_huge won't return true on pmd migration entry. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bitAneesh Kumar K.V
With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid pgd/pud/pmd entry. We also switch the p**_present() to look at this bit. With pmd_present, we have a special case. We need to make sure we consider a pmd marked invalid during THP split as present. Right now we clear the _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit for this special case. pmd_present is also updated to look at _PAGE_INVALID. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-29xarray: Replace exceptional entriesMatthew Wilcox
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-12KVM: PPC: Avoid marking DMA-mapped pages dirty in real modeAlexey Kardashevskiy
At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm() which in turn reads the old TCE and if it was a valid entry, marks the physical page dirty if it was mapped for writing. Since it is in real mode, realmode_pfn_to_page() is used instead of pfn_to_page() to get the page struct. However SetPageDirty() itself reads the compound page head and returns a virtual address for the head page struct and setting dirty bit for that kills the system. This adds additional dirty bit tracking into the MM/IOMMU API for use in the real mode. Note that this does not change how VFIO and KVM (in virtual mode) set this bit. The KVM (real mode) changes include: - use the lowest bit of the cached host phys address to carry the dirty bit; - mark pages dirty when they are unpinned which happens when the preregistered memory is released which always happens in virtual mode; - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit; - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use in the new mm_iommu_ua_mark_dirty_rm() helper; - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only caller anyway) to reduce the real mode KVM and IOMMU knowledge across different subsystems. This removes realmode_pfn_to_page() as it is not used anymore. While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for the real mode only and modules cannot call it anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-08-23powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.Aneesh Kumar K.V
When splitting a huge pmd pte, we need to mark the pmd entry invalid. We can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a swap pte. In order to differentiate between the two use a software pte bit when invalidating. For regular pte, due to bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") we need to mark the pte entry invalid when relaxing access permission. Instead of marking pte_none which can result in different page table walk routines possibly skipping this pte entry, invalidate it but still keep it marked present. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge in some commits we're sharing with the KVM tree. I manually propagated the change from commit d3d4ffaae439 ("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into pci-ioda-tce.c. Conflicts: arch/powerpc/include/asm/cputable.h arch/powerpc/platforms/powernv/pci-ioda.c arch/powerpc/platforms/powernv/pci.h
2018-07-16powerpc/64s: Remove POWER9 DD1 supportNicholas Piggin
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-20powerpc/mm/hash/4k: Free hugetlb page table caches correctly.Aneesh Kumar K.V
With 4k page size for hugetlb we allocate hugepage directories from its on slab cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free") we missed to free these allocated hugepd tables. Update pgtable_free to handle hugetlb hugepd directory table. Fixes: 0c4d268029bf ("powerpc/book3s64/mm: Simplify the rcu callback for page table free") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-07mm: introduce ARCH_HAS_PTE_SPECIALLaurent Dufour
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-03powerpc/mm: Change function prototypeAneesh Kumar K.V
In later patch, we use the vma and psize to do tlb flush. Do the prototype update in separate patch to make the review easy. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm: Use page fragments for allocation page table at PMD levelAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm: Implement helpers for pagetable fragment support at PMD levelAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/book3s64/mm: Simplify the rcu callback for page table freeAneesh Kumar K.V
Instead of encoding shift in the table address, use an enumerated index value. This allow us to do different things in the callback for pte and pmd. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31powerpc/mm/keys: Move pte bits to correct headersAneesh Kumar K.V
Memory keys are supported only with hash translation mode. Instead of using #ifdef in generic code move the key related pte bits to respective headers Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm/hash64: Store the slot information at the right offset for hugetlbAneesh Kumar K.V
The hugetlb pte entries are at the PMD and PUD level, so we can't use PTRS_PER_PTE to find the second half of the page table. Use the right offset for PUD/PMD to get to the second half of the table. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm: Fix crashes with 16G huge pagesAneesh Kumar K.V
To support memory keys, we moved the hash pte slot information to the second half of the page table. This was ok with PTE entries at level 4 (PTE page) and level 3 (PMD). We already allocate larger page table pages at those levels to accomodate extra details. For level 4 we already have the extra space which was used to track 4k hash page table entry details and at level 3 the extra space was allocated to track the THP details. With hugetlbfs PTE, we used this extra space at the PMD level to store the slot details. But we also support hugetlbfs PTE at PUD level for 16GB pages and PUD level page didn't allocate extra space. This resulted in memory corruption. Fix this by allocating extra space at PUD level when HUGETLB is enabled. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-02Merge tag 'powerpc-4.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when using the hash table MMU. - Extend our interrupt soft masking to support masking PMU interrupts as well as "normal" interrupts, and then use that to implement local_t for a ~4x speedup vs the current atomics-based implementation. - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface (OpenCAPI)" devices. - Support for new device tree properties on PowerVM to describe hotpluggable memory and devices. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO. - Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch. As well as quite a lot of other changes all over the place, and small fixes and cleanups as always. Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl Gomonovych" * tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits) powerpc/mm/radix: Fix build error when RADIX_MMU=n macintosh/ams-input: Use true and false for boolean values macintosh: change some data types from int to bool powerpc/watchdog: Print the NIP in soft_nmi_interrupt() powerpc/watchdog: regs can't be null in soft_nmi_interrupt() powerpc/watchdog: Tweak watchdog printks powerpc/cell: Remove axonram driver rtc-opal: Fix handling of firmware error codes, prevent busy loops powerpc/mpc52xx_gpt: make use of raw_spinlock variants macintosh/adb: Properly mark continued kernel messages powerpc/pseries: Fix cpu hotplug crash with memoryless nodes powerpc/numa: Ensure nodes initialized for hotplug powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes powerpc/kernel: Block interrupts when updating TIDR powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn powerpc/mm/nohash: do not flush the entire mm when range is a single page powerpc/pseries: Add Initialization of VF Bars powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV powerpc/eeh: Add EEH notify resume sysfs powerpc/eeh: Add EEH operations to notify resume ...
2018-01-31mm/thp: remove pmd_huge_split_prepare()Aneesh Kumar K.V
Instead of marking the pmd ready for split, invalidate the pmd. This should take care of powerpc requirement. Only side effect is that we mark the pmd invalid early. This can result in us blocking access to the page a bit longer if we race against a thp split. [kirill.shutemov@linux.intel.com: rebased, dirty THP once] Link: http://lkml.kernel.org/r/20171213105756.69879-13-kirill.shutemov@linux.intel.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>