summaryrefslogtreecommitdiff
path: root/arch/s390/mm/vmem.c
AgeCommit message (Collapse)Author
2018-01-08mm: pass the vmem_altmap to vmemmap_freeChristoph Hellwig
We can just pass this on instead of having to do a radix tree lookup without proper locking a few levels into the callchain. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08mm: pass the vmem_altmap to vmemmap_populateChristoph Hellwig
We can just pass this on instead of having to do a radix tree lookup without proper locking a few levels into the callchain. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: "Since Martin is on vacation you get the s390 pull request for the v4.15 merge window this time from me. Besides a lot of cleanups and bug fixes these are the most important changes: - a new regset for runtime instrumentation registers - hardware accelerated AES-GCM support for the aes_s390 module - support for the new CEX6S crypto cards - support for FORTIFY_SOURCE - addition of missing z13 and new z14 instructions to the in-kernel disassembler - generate opcode tables for the in-kernel disassembler out of a simple text file instead of having to manually maintain those tables - fast memset16, memset32 and memset64 implementations - removal of named saved segment support - hardware counter support for z14 - queued spinlocks and queued rwlocks implementations for s390 - use the stack_depth tracking feature for s390 BPF JIT - a new s390_sthyi system call which emulates the sthyi (store hypervisor information) instruction - removal of the old KVM virtio transport - an s390 specific CPU alternatives implementation which is used in the new spinlock code" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (88 commits) MAINTAINERS: add virtio-ccw.h to virtio/s390 section s390/noexec: execute kexec datamover without DAT s390: fix transactional execution control register handling s390/bpf: take advantage of stack_depth tracking s390: simplify transactional execution elf hwcap handling s390/zcrypt: Rework struct ap_qact_ap_info. s390/virtio: remove unused header file kvm_virtio.h s390: avoid undefined behaviour s390/disassembler: generate opcode tables from text file s390/disassembler: remove insn_to_mnemonic() s390/dasd: avoid calling do_gettimeofday() s390: vfio-ccw: Do not attempt to free no-op, test and tic cda. s390: remove named saved segment support s390/archrandom: Reconsider s390 arch random implementation s390/pci: do not require AIS facility s390/qdio: sanitize put_indicator s390/qdio: use atomic_cmpxchg s390/nmi: avoid using long-displacement facility s390: pass endianness info to sparse s390/decompressor: remove informational messages ...
2017-11-08s390: avoid undefined behaviourHeiko Carstens
At a couple of places smatch emits warnings like this: arch/s390/mm/vmem.c:409 vmem_map_init() warn: right shifting more than type allows In fact shifting a signed type right is undefined. Avoid this and add an unsigned long cast. The shifted values are always positive. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-08s390: remove named saved segment supportHeiko Carstens
Remove the support to create a z/VM named saved segment (NSS). This feature is not supported since quite a while in favour of jump labels, function tracing and (now) CPU alternatives. All of these features require to write to the kernel text section which is not possible if the kernel is contained within an NSS. Given that memory savings are minimal if kernel images are shared and in addition updates of shared images are painful, the NSS feature can be removed. Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-09s390/mm: use memset64 instead of clear_tableHeiko Carstens
Use memset64 instead of the (now) open-coded variant clear_table. Performance wise there is no difference. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-26s390/mm,vmem: simplify region and segment table allocation codeHeiko Carstens
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12s390/mm: implement 5 level pages tablesMartin Schwidefsky
Add the logic to upgrade the page table for a 64-bit process to five levels. This increases the TASK_SIZE from 8PB to 16EB-4K. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-05-08s390: use set_memory.h headerLaura Abbott
set_memory_* functions have moved to set_memory.h. Switch to this explicitly Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-17s390: mm: Audit and remove any unnecessary uses of module.hPaul Gortmaker
Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. The advantage in doing so is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each change instance for the presence of either and replace as needed. An instance where module_param was used without moduleparam.h was also fixed, as well as an implict use of asm/elf.h header. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-08s390: add no-execute supportMartin Schwidefsky
Bit 0x100 of a page table, segment table of region table entry can be used to disallow code execution for the virtual addresses associated with the entry. There is one tricky bit, the system call to return from a signal is part of the signal frame written to the user stack. With a non-executable stack this would stop working. To avoid breaking things the protection fault handler checks the opcode that caused the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) and injects a system call. This is preferable to the alternative solution with a stub function in the vdso because it works for vdso=off and statically linked binaries as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-11-29s390: convert remaining bootmem allocations to memblockHeiko Carstens
Get rid of all remaining alloc_bootmem calls and use memblock_alloc instead everywhere. This way we get rid of the inconsistent mixture of alloc_bootmem and memblock_alloc usages. Two of the alloc_bootmem_low calls within arch/s390/kernel/setup.c are replaced with memblock_alloc calls that don't enforce that the allocated memory is below 2GB. This restriction was never necessary. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: add proper __ro_after_init supportHeiko Carstens
On s390 __ro_after_init is currently mapped to __read_mostly which means that data marked as __ro_after_init will not be protected. Reason for this is that the common code __ro_after_init implementation is x86 centric: the ro_after_init data section was added to rodata, since x86 enables write protection to kernel text and rodata very late. On s390 we have write protection for these sections enabled with the initial page tables. So adding the ro_after_init data section to rodata does not work on s390. In order to make __ro_after_init work properly on s390 move the ro_after_init data, right behind rodata. Unlike the rodata section it will be marked read-only later after all init calls happened. This s390 specific implementation adds new __start_ro_after_init and __end_ro_after_init labels. Everything in between will be marked read-only after the init calls happened. In addition to the __ro_after_init data move also the exception table there, since from a practical point of view it fits the __ro_after_init requirements. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: add mapping statisticsHeiko Carstens
Add statistics that show how memory is mapped within the kernel identity mapping. This is more or less the same like git commit ce0c0e50f94e ("x86, generic: CPA add statistics about state of direct mapping v4") for x86. I also intentionally copied the lower case "k" within DirectMap4k vs the upper case "M" and "G" within the two other lines. Let's have consistent inconsistencies across architectures. The output of /proc/meminfo now contains these additional lines: DirectMap4k: 2048 kB DirectMap1M: 3991552 kB DirectMap2G: 4194304 kB The implementation on s390 is lockless unlike the x86 version, since I assume changes to the kernel mapping are a very rare event. Therefore it really doesn't matter if these statistics could potentially be inconsistent if read while kernel pages tables are being changed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: simplify vmem code for read-only mappingsHeiko Carstens
For the kernel identity mapping map everything read-writeable and subsequently call set_memory_ro() to make the ro section read-only. This simplifies the code a lot. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pageattr: allow kernel page table splittingHeiko Carstens
set_memory_ro() and set_memory_rw() currently only work on 4k mappings, which is good enough for module code aka the vmalloc area. However we stumbled already twice into the need to make this also work on larger mappings: - the ro after init patch set - the crash kernel resize code Therefore this patch implements automatic kernel page table splitting if e.g. set_memory_ro() would be called on parts of a 2G mapping. This works quite the same as the x86 code, but is much simpler. In order to make this work and to be architecturally compliant we now always use the csp, cspg or crdte instructions to replace valid page table entries. This means that set_memory_ro() and set_memory_rw() will be much more expensive than before. In order to avoid huge latencies the code contains a couple of cond_resched() calls. The current code only splits page tables, but does not merge them if it would be possible. The reason for this is that currently there is no real life scenarion where this would really happen. All current use cases that I know of only change access rights once during the life time. If that should change we can still implement kernel page table merging at a later time. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: make use of pte_clear()Heiko Carstens
Use pte_clear() instead of open-coding it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: introduce and use SEGMENT_KERNEL and REGION3_KERNELHeiko Carstens
Instead of open-coded SEGMENT_KERNEL and REGION3_KERNEL assignments use defines. Also to make e.g. pmd_wrprotect() work on the kernel mapping a couple more flags must be set. Therefore add the missing flags also. In order to make everything symmetrical this patch also adds software dirty, young, read and write bits for region 3 table entries. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: align segment and region tables to 16kHeiko Carstens
Usually segment and region tables are 16k aligned due to the way the buddy allocator works. This is not true for the vmem code which only asks for a 4k alignment. In order to be consistent enforce a 16k alignment here as well. This alignment will be assumed and therefore is required by the pageattr code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-11s390/vmem: remove unused function parameterHeiko Carstens
vmem_pte_alloc() has an unused function parameter. Let's remove it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-11s390/vmem: fix identity mappingHeiko Carstens
The identity mapping is suboptimal for the last 2GB frame. The mapping will be established with a mix of 4KB and 1MB mappings instead of a single 2GB mapping. This happens because of a off-by-one bug introduced with commit 50be63450728 ("s390/mm: Convert bootmem to memblock"). Currently the identity mapping looks like this: 0x0000000080000000-0x0000000180000000 4G PUD RW 0x0000000180000000-0x00000001fff00000 2047M PMD RW 0x00000001fff00000-0x0000000200000000 1M PTE RW With the bug fixed it looks like this: 0x0000000080000000-0x0000000200000000 6G PUD RW Fixes: 50be63450728 ("s390/mm: Convert bootmem to memblock") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-15s390: query dynamic DEBUG_PAGEALLOC settingChristian Borntraeger
We can use debug_pagealloc_enabled() to check if we can map the identity mapping with 1MB/2GB pages as well as to print the current setting in dump_stack. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25s390: remove 31 bit supportHeiko Carstens
Remove the 31 bit support in order to reduce maintenance cost and effectively remove dead code. Since a couple of years there is no distribution left that comes with a 31 bit kernel. The 31 bit kernel also has been broken since more than a year before anybody noticed. In addition I added a removal warning to the kernel shown at ipl for 5 minutes: a960062e5826 ("s390: add 31 bit warning message") which let everybody know about the plan to remove 31 bit code. We didn't get any response. Given that the last 31 bit only machine was introduced in 1999 let's remove the code. Anybody with 31 bit user space code can still use the compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "This patch set contains the main portion of the changes for 3.18 in regard to the s390 architecture. It is a bit bigger than usual, mainly because of a new driver and the vector extension patches. The interesting bits are: - Quite a bit of work on the tracing front. Uprobes is enabled and the ftrace code is reworked to get some of the lost performance back if CONFIG_FTRACE is enabled. - To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the IPTE range facility is added. - The rwlock code is re-factored to improve writer fairness and to be able to use the interlocked-access instructions. - The kernel part for the support of the vector extension is added. - The device driver to access the CD/DVD on the HMC is added, this will hopefully come in handy to improve the installation process. - Add support for control-unit initiated reconfiguration. - The crypto device driver is enhanced to enable the additional AP domains and to allow the new crypto hardware to be used. - Bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390/ftrace: simplify enabling/disabling of ftrace_graph_caller s390/ftrace: remove 31 bit ftrace support s390/kdump: add support for vector extension s390/disassembler: add vector instructions s390: add support for vector extension s390/zcrypt: Toleration of new crypto hardware s390/idle: consolidate idle functions and definitions s390/nohz: use a per-cpu flag for arch_needs_cpu s390/vtime: do not reset idle data on CPU hotplug s390/dasd: add support for control unit initiated reconfiguration s390/dasd: fix infinite loop during format s390/mm: make use of ipte range facility s390/setup: correct 4-level kernel page table detection s390/topology: call set_sched_topology early s390/uprobes: architecture backend for uprobes s390/uprobes: common library for kprobes and uprobes s390/rwlock: use the interlocked-access facility 1 instructions s390/rwlock: improve writer fairness s390/rwlock: remove interrupt-enabling rwlock variant. s390/mm: remove change bit override support ...
2014-09-25s390/mm: remove change bit override supportHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25s390/vmemmap: remove memset call from vmemmap_populate()Heiko Carstens
If the vmemmap array gets filled with large pages we allocate those pages with vmemmap_alloc_block(), which returns cleared pages. Only for single 4k pages we call our own vmem_alloc_pages() which does not return cleared pages. However we can also call vmemmap_alloc_block() to allocate the 4k pages. This way we can also make sure the vmemmap array is cleared after its population. Therefore we can remove the memset at the end of the function which would clear the vmmemmap array a second time on machines which do support EDAT1. On very large configurations this can save us several seconds. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-26KVM: s390/mm: use radix trees for guest to host mappingsMartin Schwidefsky
Store the target address for the gmap segments in a radix tree instead of using invalid segment table entries. gmap_translate becomes a simple radix_tree_lookup, gmap_fault is split into the address translation with gmap_translate and the part that does the linking of the gmap shadow page table with the process page table. A second radix tree is used to keep the pointers to the segment table entries for segments that are mapped in the guest address space. On unmap of a segment the pointer is retrieved from the radix tree and is used to carry out the segment invalidation in the gmap shadow page table. As the radix tree can only store one pointer, each host segment may only be mapped to exactly one guest location. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-20s390/mm: Convert bootmem to memblockPhilipp Hachtmann
The original bootmem allocator is getting replaced by memblock. To cover the needs of the s390 kdump implementation the physical memory list is used. With this patch the bootmem allocator and its bitmaps are completely removed from s390. Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03s390/mm,tlb: optimize TLB flushing for zEC12Martin Schwidefsky
The zEC12 machines introduced the local-clearing control for the IDTE and IPTE instruction. If the control is set only the TLB of the local CPU is cleared of entries, either all entries of a single address space for IDTE, or the entry for a single page-table entry for IPTE. Without the local-clearing control the TLB flush is broadcasted to all CPUs in the configuration, which is expensive. The reset of the bit mask of the CPUs that need flushing after a non-local IDTE is tricky. As TLB entries for an address space remain in the TLB even if the address space is detached a new bit field is required to keep track of attached CPUs vs. CPUs in the need of a flush. After a non-local flush with IDTE the bit-field of attached CPUs is copied to the bit-field of CPUs in need of a flush. The ordering of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is such that an underindication in mm_cpumask(mm) is prevented but an overindication in mm_cpumask(mm) is possible. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-29s390/mm: implement software referenced bitsMartin Schwidefsky
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/mm: cleanup page table definitionsMartin Schwidefsky
Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-03s390/mem_detect: remove artificial kdump memory typesHeiko Carstens
Simplify the memory detection code a bit by removing the CHUNK_OLDMEM and CHUNK_CRASHK memory types. They are not needed. Everything that is needed is a mechanism to insert holes into the detected memory. Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-29sparse-vmemmap: specify vmemmap population range in bytesJohannes Weiner
The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: David S. Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28s390/mm: ignore change bit for vmemmapHeiko Carstens
Add hint to the page tables that we don't care about the change bit in storage keys that belong to vmemmap pages. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-02-23memory-hotplug: remove memmap of sparse-vmemmapTang Chen
Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-14s390/mm: implement software dirty bitsMartin Schwidefsky
The s390 architecture is unique in respect to dirty page detection, it uses the change bit in the per-page storage key to track page modifications. All other architectures track dirty bits by means of page table entries. This property of s390 has caused numerous problems in the past, e.g. see git commit ef5d437f71afdf4a "mm: fix XFS oops due to dirty pages without buffers on s390". To avoid future issues in regard to per-page dirty bits convert s390 to a fault based software dirty bit detection mechanism. All user page table entries which are marked as clean will be hardware read-only, even if the pte is supposed to be writable. A write by the user process will trigger a protection fault which will cause the user pte to be marked as dirty and the hardware read-only bit is removed. With this change the dirty bit in the storage key is irrelevant for Linux as a host, but the storage key is still required for KVM guests. The effect is that page_test_and_clear_dirty and the related code can be removed. The referenced bit in the storage key is still used by the page_test_and_clear_young primitive to provide page age information. For page cache pages of mappings with mapping_cap_account_dirty there will not be any change in behavior as the dirty bit tracking already uses read-only ptes to control the amount of dirty pages. Only for swap cache pages and pages of mappings without mapping_cap_account_dirty there can be additional protection faults. To avoid an excessive number of additional faults the mk_pte primitive checks for PageDirty if the pgprot value allows for writes and pre-dirties the pte. That avoids all additional faults for tmpfs and shmem pages until these pages are added to the swap cache. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23s390/mm,vmemmap: use 1MB frames for vmemmapHeiko Carstens
Use 1MB frames for vmemmap if EDAT1 is available in order to reduce TLB pressure Always use a 1MB frame even if its only partially needed for struct pages. Otherwise we would end up with a mix of large frame and page mappings, because vmemmap_populate gets called for each section (256MB -> 3.5MB memmap) separately. Worst case is that we would waste 512KB. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23s390/mm,vmem: use 2GB frames for identity mappingHeiko Carstens
Use 2GB frames for indentity mapping if EDAT2 is available to reduce TLB pressure. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm,vmem: fix vmem_add_mem()/vmem_remove_range()Heiko Carstens
vmem_add_mem() should only then insert a large page if pmd_none() is true for the specific entry. We might have a leftover from a previous mapping. In addition make vmem_remove_range()'s page table walk code more complete and fix a couple of potential endless loops (which can never happen :). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm: fix mapping of read-only kernel text sectionHeiko Carstens
Within the identity mapping the kernel text section is mapped read-only. However when mapping the first and last page of the text section we must round upwards and downwards respectively, if only parts of a page belong to the section. Otherwise potential rw data can be mapped read-only. So the rounding must be done just the other way we have it right now. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm: fix pmd_huge() usage for kernel mappingHeiko Carstens
pmd_huge() will always return 0 on !HUGETLBFS, however we use that helper function when walking the kernel page tables to decide if we have a 1MB page frame or not. Since we create 1MB frames for the kernel 1:1 mapping independently of HUGETLBFS this can lead to incorrect storage accesses since the code can assume that we have a pointer to a page table instead of a pointer to a 1MB frame. Fix this by adding a pmd_large() primitive like other architectures have it already and remove all references to HUGETLBFS/HUGETLBPAGE from the code that walks kernel page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390: enable large page support with CONFIG_DEBUG_PAGEALLOCGerald Schaefer
So far, large page support was completely disabled with CONFIG_DEBUG_PAGEALLOC, although it would be sufficient if only the large page kernel mapping was disabled. This patch enables large page support with CONFIG_DEBUG_PAGEALLOC, while it prevents the large kernel mapping in that case. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-20s390/comments: unify copyright messages and remove file namesHeiko Carstens
Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-05-24s390/headers: replace __s390x__ with CONFIG_64BIT where possibleHeiko Carstens
Replace __s390x__ with CONFIG_64BIT in all places that are not exported to userspace or guarded with #ifdef __KERNEL__. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30[S390] kdump backend codeMichael Holzheu
This patch provides the architecture specific part of the s390 kdump support. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm guest address space mappingMartin Schwidefsky
Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23[S390] refactor page table functions for better pgste supportMartin Schwidefsky
Rework the architecture page table functions to access the bits in the page table extension array (pgste). There are a number of changes: 1) Fix missing pgste update if the attach_count for the mm is <= 1. 2) For every operation that affects the invalid bit in the pte or the rcp byte in the pgste the pcl lock needs to be acquired. The function pgste_get_lock gets the pcl lock and returns the current pgste value for a pte pointer. The function pgste_set_unlock stores the pgste and releases the lock. Between these two calls the bits in the pgste can be shuffled. 3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid calling SetPageDirty and SetPageReferenced from pgtable.h. If the host reference backup bit or the host change backup bit has been set the dirty/referenced state is transfered to the pte. The common code will pick up the state from the pte. 4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect. 5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate. 6) Rename kvm_s390_test_and_clear_page_dirty to ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young. 7) Define mm_exclusive() and mm_has_pgste() helper to improve readability. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-09mm: provide init_mm mm_context initializerHeiko Carstens
Provide an INIT_MM_CONTEXT intializer macro which can be used to statically initialize mm_struct:mm_context of init_mm. This way we can get rid of code which will do the initialization at run time (on s390). In addition the current code can be found at a place where it is not expected. So let's have a common initializer which architectures can use if needed. This is based on a patch from Suzuki Poulose. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Suzuki Poulose <suzuki@in.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-09[S390] s390: disable change bit overrideChristian Borntraeger
commit 6a985c6194017de2c062916ad1cd00dee0302c40 ([S390] s390: use change recording override for kernel mapping) deactivated the change bit recording for the kernel mapping to improve the performance. This works most of the time, but there are cases (e.g. kernel runs in home space, futex atomic compare xcmg) where we modify user memory with the kernel mapping instead of the user mapping. Instead of fixing these cases, this patch just deactivates change bit override to avoid future problems with other kernel code that might use the kernel mapping for user memory. CC: stable@kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>