summaryrefslogtreecommitdiff
path: root/arch/s390/mm/fault.c
AgeCommit message (Collapse)Author
2024-11-29Merge tag 's390-6.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Add swap entry for hugetlbfs support - Add PTE_MARKER support for hugetlbs mappings; this fixes a regression (possible page fault loop) which was introduced when support for UFFDIO_POISON for hugetlbfs was added - Add ARCH_HAS_PREEMPT_LAZY and PREEMPT_DYNAMIC support - Mark IRQ entries in entry code, so that stack tracers can filter out the non-IRQ parts of stack traces. This fixes stack depot capacity limit warnings, since without filtering the number of unique stack traces is huge - In PCI code fix leak of struct zpci_dev object, and fix potential double remove of hotplug slot - Fix pagefault_disable() / pagefault_enable() unbalance in arch_stack_user_walk_common() - A couple of inline assembly optimizations, more cmpxchg() to try_cmpxchg() conversions, and removal of usages of xchg() and cmpxchg() on one and two byte memory areas - Various other small improvements and cleanups * tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (27 commits) Revert "s390/mm: Allow large pages for KASAN shadow mapping" s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8() s390/spinlock: Use R constraint for arch_load_niai4() s390/spinlock: Generate shorter code for arch_spin_unlock() s390/spinlock: Remove condition code clobber from arch_spin_unlock() s390/spinlock: Use symbolic names in inline assemblies s390: Support PREEMPT_DYNAMIC s390/pci: Fix potential double remove of hotplug slot s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails s390/mm/hugetlbfs: Add missing includes s390/mm: Add PTE_MARKER support for hugetlbfs mappings s390/mm: Introduce region-third and segment table swap entries s390/mm: Introduce region-third and segment table entry present bits s390/mm: Rearrange region-third and segment table entry SW bits KVM: s390: Increase size of union sca_utility to four bytes KVM: s390: Remove one byte cmpxchg() usage KVM: s390: Use try_cmpxchg() instead of cmpxchg() loops s390/ap: Replace xchg() with WRITE_ONCE() s390/mm: Allow large pages for KASAN shadow mapping s390: Add ARCH_HAS_PREEMPT_LAZY support ...
2024-11-27s390/mm: Add PTE_MARKER support for hugetlbfs mappingsGerald Schaefer
Commit 8a13897fb0daa ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") added support for PTE_MARKER_POISONED for hugetlbfs, but PTE_MARKER also needs support for swap entries. For s390, swap entries were only supported on PTE level, not on the PMD/PUD levels that are used for large hugetlbfs mappings. Therefore, when writing a PTE_MARKER_POISONED entry, the resulting entry on PMD/PUD level would be an invalid / empty entry. Further access would then generate a pagefault loop, instead of the expected SIGBUS. It is a loop inside the kernel, but interruptible and uffd fault handling also calls schedule() in between, so at least it won't completely block the system. Previous commits prepared support for swap entries on PMD/PUD levels. PTE_MARKER support for hugetlbfs can now be enabled by simply adding an extra is_pte_marker() check to huge_pte_none_mostly(). Fault handling code also needs to be adjusted to expect the VM_FAULT_HWPOISON_LARGE fault flag, which was not possible on s390 before. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-19Merge tag 'timers-vdso-2024-11-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vdso data page handling updates from Thomas Gleixner: "First steps of consolidating the VDSO data page handling. The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: - consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. - removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. - seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately" * tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/vdso: Add missing brackets in switch case vdso: Rename struct arch_vdso_data to arch_vdso_time_data powerpc: Split systemcfg struct definitions out from vdso powerpc: Split systemcfg data out of vdso data page powerpc: Add kconfig option for the systemcfg page powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors powerpc/pseries/lparcfg: Fix printing of system_active_processors powerpc/procfs: Propagate error of remap_pfn_range() powerpc/vdso: Remove offset comment from 32bit vdso_arch_data x86/vdso: Split virtual clock pages into dedicated mapping x86/vdso: Delete vvar.h x86/vdso: Access vdso data without vvar.h x86/vdso: Move the rng offset to vsyscall.h x86/vdso: Access rng vdso data without vvar.h x86/vdso: Access timens vdso data without vvar.h x86/vdso: Allocate vvar page from C code x86/vdso: Access rng data from kernel without vvar x86/vdso: Place vdso_data at beginning of vvar page x86/vdso: Use __arch_get_vdso_data() to access vdso data x86/mm/mmap: Remove arch_vma_name() ...
2024-10-29s390/mm: Cleanup fault error handlingHeiko Carstens
Combine the two VM_FAULT_ERROR checks in do_exception() and move them to the exit path, similar to x86. Also remove a random blank line. Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-29s390/mm: Convert to LOCK_MM_AND_FIND_VMAHeiko Carstens
With the gmap code gone s390 can be easily converted to LOCK_MM_AND_FIND_VMA like it has been done for most other architectures. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-12-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-29s390/mm: Get rid of fault type switch statementsHeiko Carstens
With GMAP_FAULT fault type gone, there are only KERNEL_FAULT and USER_FAULT fault types left. Therefore there is no need for any fault type switch statements left. Rename get_fault_type() into is_kernel_fault() and let it return a boolean value. Change all switch statements to if statements. This removes quite a bit of code. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-11-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-29s390/mm: Simplify get_fault_type()Heiko Carstens
With the gmap code gone get_fault_type() can be simplified: - every fault with user_mode(regs) == true must be a fault in user address space - every fault with user_mode(regs) == false is only a fault in user address space if the used address space is the secondary address space - every other fault is within the kernel address space Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-10-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-29s390/mm/fault: Handle guest-related program interrupts in KVMClaudio Imbrenda
Any program interrupt that happens in the host during the execution of a KVM guest will now short circuit the fault handler and return to KVM immediately. Guest fault handling (including pfault) will happen entirely inside KVM. When sie64a() returns zero, current->thread.gmap_int_code will contain the program interrupt number that caused the exit, or zero if the exit was not caused by a host program interrupt. KVM will now take care of handling all guest faults in vcpu_post_run(). Since gmap faults will not be visible by the rest of the kernel, remove GMAP_FAULT, the linux fault handlers for secure execution faults, the exception table entries for the sie instruction, the nop padding after the sie instruction, and all other references to guest faults from the s390 code. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Co-developed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-6-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-29s390/entry: Remove __GMAP_ASCE and use _PIF_GUEST_FAULT againClaudio Imbrenda
Now that the guest ASCE is passed as a parameter to __sie64a(), _PIF_GUEST_FAULT can be used again to determine whether the fault was a guest or host fault. Since the guest ASCE will not be taken from the gmap pointer in lowcore anymore, __GMAP_ASCE can be removed. For the same reason the guest ASCE needs now to be saved into the cr1 save area unconditionally. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-2-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-16s390: Remove remaining _PAGE_* macrosVincenzo Frascino
The introduction of vdso/page.h made the definition of _PAGE_SHIFT, _PAGE_SIZE, _PAGE_MASK redundant. Refactor the code to remove the macros. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241014151340.1639555-4-vincenzo.frascino@arm.com Closes: https://lore.kernel.org/oe-kbuild-all/202410112106.mvc2U2p0-lkp@intel.com/
2024-09-01s390/mm/fault: convert do_secure_storage_access() from follow_page() to ↵David Hildenbrand
folio_walk Let's get rid of another follow_page() user and perform the conversion under PTL: Note that this is also what follow_page_pte() ends up doing. Unfortunately we cannot currently optimize out the additional reference, because arch_make_folio_accessible() must be called with a raised refcount to protect against concurrent conversion to secure. We can just move the arch_make_folio_accessible() under the PTL, like follow_page_pte() would. We'll effectively drop the "writable" check implied by FOLL_WRITE: follow_page_pte() would also not check that when calling arch_make_folio_accessible(), so there is no good reason for doing that here. We'll lose the secretmem check from follow_page() as well, about which we shouldn't really care. Link: https://lkml.kernel.org/r/20240802155524.517137-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-17s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()Gerald Schaefer
There is no support for HWPOISON, MEMORY_FAILURE, or ARCH_HAS_COPY_MC on s390. Therefore we do not expect to see VM_FAULT_HWPOISON in do_exception(). However, since commit af19487f00f3 ("mm: make PTE_MARKER_SWAPIN_ERROR more general"), it is possible to see VM_FAULT_HWPOISON in combination with PTE_MARKER_POISONED, even on architectures that do not support HWPOISON otherwise. In this case, we will end up on the BUG() in do_exception(). Fix this by treating VM_FAULT_HWPOISON the same as VM_FAULT_SIGBUS, similar to x86 when MEMORY_FAILURE is not configured. Also print unexpected fault flags, for easier debugging. Note that VM_FAULT_HWPOISON_LARGE is not expected, because s390 cannot support swap entries on other levels than PTE level. Cc: stable@vger.kernel.org # 6.6+ Fixes: af19487f00f3 ("mm: make PTE_MARKER_SWAPIN_ERROR more general") Reported-by: Yunseong Kim <yskelg@gmail.com> Tested-by: Yunseong Kim <yskelg@gmail.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Message-ID: <20240715180416.3632453-1-gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-18s390: Replace S390_lowcore by get_lowcore()Sven Schnelle
Replace all S390_lowcore usages in arch/s390/ by get_lowcore(). Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-05s390/uv: Implement HAVE_ARCH_MAKE_FOLIO_ACCESSIBLEDavid Hildenbrand
Let's also implement HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE, so we can convert arch_make_page_accessible() to be a simple wrapper around arch_make_folio_accessible(). Unfortunately, we cannot do that in the header. There are only two arch_make_page_accessible() calls remaining in gup.c. We can now drop HAVE_ARCH_MAKE_PAGE_ACCESSIBLE completely form core-MM. We'll handle that separately, once the s390x part landed. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20240508182955.358628-10-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-25s390: mm: accelerate pagefault when badaccessKefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a bad access, directly handle error, no need to retry with mmap_lock again. Since the page faut is handled under per-VMA lock, count it as a vma lock event with VMA_LOCK_SUCCESS. Link: https://lkml.kernel.org/r/20240403083805.1818160-7-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-03s390/mm: fix NULL pointer dereferenceHeiko Carstens
The recently added check to figure out if a fault happened on gmap ASCE dereferences the gmap pointer in lowcore without checking that it is not NULL. For all non-KVM processes the pointer is NULL, so that some value from lowcore will be read. With the current layouts of struct gmap and struct lowcore the read value (aka ASCE) is zero, so that this doesn't lead to any observable bug; at least currently. Fix this by adding the missing NULL pointer check. Fixes: 64c3431808bd ("s390/entry: compare gmap asce to determine guest/host fault") Acked-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-03-17s390/entry: compare gmap asce to determine guest/host faultSven Schnelle
With the current implementation, there are some cornercases where a host fault would be treated as a guest fault, for example when the sie instruction causes a program check. Therefore store the gmap asce in ptregs, and use that to compare the primary asce from the fault instead of matching instruction addresses. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-01-11s390/mm,fault: remove not needed tsk variableHeiko Carstens
tsk is only used as an intermediate variable for current. Remove tsk and use current directly instead at the only place where it is used. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-29arch/mm/fault: fix major fault accounting when retrying under per-VMA lockSuren Baghdasaryan
A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-23s390/mm,fault: move VM_FAULT_ERROR handling to do_exception()Heiko Carstens
Get rid of do_fault_error() and move its contents to do_exception(), which makes do_exception(). With removing do_fault_error() it is also possible to get rid of the handle_fault_error_nolock() wrapper. Instead rename do_no_context() to handle_fault_error_nolock(). In result the whole fault handling looks much more like on other architectures. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove VM_FAULT_BADMAP and VM_FAULT_BADACCESSHeiko Carstens
Remove the last two private vm_fault reasons: VM_FAULT_BADMAP and VM_FAULT_BADACCESS. In order to achieve this add an si_code parameter to do_no_context() and it's wrappers and directly call the wrappers instead of relying on do_fault_error() handling. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove VM_FAULT_SIGNALHeiko Carstens
Remove VM_FAULT_SIGNAL and open-code it at the only two locations where it is used. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove VM_FAULT_BADCONTEXTHeiko Carstens
Remove VM_FAULT_BADCONTEXT and instead call do_no_context() via wrappers. This adds two new wrappers similar to what x86 has: handle_fault_error() and handle_fault_error_nolock(). Both of them simply call do_no_context(), while handle_fault_error() also unlocks mmap lock, which avoids adding lots of mmap_read_unlock() calls with this and subsequent patches. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: simplify kfence fault handlingHeiko Carstens
do_no_context() can be simplified by removing its fault parameter, which is only used to decide if kfence_handle_page_fault() should be called. If the fault happened within the kernel space it is ok to always check if this happened on a page which was unmapped because of the kfence feature. Limiting the check to the VM_FAULT_BADCONTEXT case doesn't add any value. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: call do_fault_error() only from do_exception()Heiko Carstens
Remove duplicated fault error handling and handle it only once within do_exception(). Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: get rid of do_low_address()Heiko Carstens
There is only one caller of do_low_address(). Given that this code is quite special just get rid of do_low_address, and add it to do_protection_exception() in order to make the code a bit more readable. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove VM_FAULT_PFAULTHeiko Carstens
Handling of VM_FAULT_PFAULT and VM_FAULT_BADCONTEXT is nearly identical; the only difference is within do_no_context() where however the fault_type (KERNEL_FAULT vs GMAP_FAULT) makes sure that both types will be handled differently. Therefore it is possible to get rid of VM_FAULT_PFAULT and use VM_FAULT_BADCONTEXT instead. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: use get_kernel_nofault() to dereference in dump_pagetable()Heiko Carstens
The page table dumper uses get_kernel_nofault() to test if dereferencing page table entries is possible. Use the result, which is the required page table entry, instead of throwing it away and dereferencing a second time without any safe guard. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: improve readability by using teid unionHeiko Carstens
Get rid of some magic numbers, and use the teid union and also some ptrace PSW defines to improve readability. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: use static key for store indicationHeiko Carstens
Generate slightly better code by using a static key to implement store indication. This allows to get rid of a memory access on the hot path. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: use get_fault_address() everywhereHeiko Carstens
Use the get_fault_address() helper function instead of open-coding it at many locations. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: replace WARN_ON_ONCE() with unreachable()Heiko Carstens
do_secure_storage_access() contains a switch statements which handles all possible return values from get_fault_type(). Therefore remove the pointless default case error handling and replace it with unreachable(). Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove noinline attribute from all functionsHeiko Carstens
Remove all noinline attribute from all functions and leave the inlining decisions up to the compiler. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove line breakHeiko Carstens
chechpatch reports: CHECK: Alignment should match open parenthesis + if (IS_ENABLED(CONFIG_PGSTE) && gmap && + (flags & FAULT_FLAG_RETRY_NOWAIT)) { Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: include linux/mmu_context.hHeiko Carstens
Include linux/mmu_context.h instead asm/mmu_context.h. checkpatch reports: CHECK: Consider using #include <linux/mmu_context.h> instead of <asm/mmu_context.h> +#include <asm/mmu_context.h> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: have balanced braces, remove unnecessary blanksHeiko Carstens
Remove unnecessary braces and also blanks after casts. Add braces to have balanced braces where missing. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: use pr_warn(), pr_cont(), ... instead of open-codingHeiko Carstens
Use pr_warn() and friends instead of open-coding with printk(). Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: use pr_warn_ratelimited()Heiko Carstens
Use pr_warn_ratelimited() instead of printk_ratelimited(). checkpatch reports: WARNING: Prefer ... pr_warn_ratelimited(... to printk_ratelimited(KERN_WARNING ... + printk_ratelimited(KERN_WARNING Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: use __ratelimit() instead of printk_ratelimit()Heiko Carstens
Just like other architectures make use __ratelimit() instead of printk_ratelimit(). Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: reverse x-mas tree coding styleHeiko Carstens
Have reverse x-mas tree coding style for variables everywhere. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-23s390/mm,fault: remove and improve comments, adjust whitespaceHeiko Carstens
Remove wrong, outdated, and pointless comments. Adjust wording for some comments, and adjust whitespace at some places. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390/ctlreg: add struct ctlregHeiko Carstens
Add struct ctlreg to enforce strict type checking / usage for control register functions. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-08-31Merge tag 'kvm-s390-next-6.6-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya)
2023-08-29Merge tag 'mm-stable-2023-08-28-18-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2023-08-28s390/uv: UV feature check utilitySteffen Eiden
Introduces a function to check the existence of an UV feature. Refactor feature bit checks to use the new function. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-3-seiden@linux.ibm.com Message-Id: <20230815151415.379760-3-seiden@linux.ibm.com>
2023-08-24mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETEDSuren Baghdasaryan
handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means mmap_lock has been released. However with per-VMA locks behavior is different and the caller should still release it. To make the rules consistent for the caller, drop the per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns VM_FAULT_COMPLETED for now. [willy@infradead.org: fix riscv] Link: https://lkml.kernel.org/r/CAJuCfpE6GWEx1rPBmNpUfoD5o-gNFz9-UFywzCE2PbEGBiVz7g@mail.gmail.com Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm: remove CONFIG_PER_VMA_LOCK ifdefsMatthew Wilcox (Oracle)
Patch series "Handle most file-backed faults under the VMA lock", v3. This patchset adds the ability to handle page faults on parts of files which are already in the page cache without taking the mmap lock. This patch (of 10): Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to eliminate ifdefs in the users. Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-29s390/mm: move pfault code to own C fileHeiko Carstens
The pfault code has nothing to do with regular fault handling. Therefore move it to an own C file. Also add an own pfault header file. This way changes to setup.h don't cause a recompile of the pfault code and vice versa. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-18s390/mm: fix per vma lock fault handlingSven Schnelle
With per-vma locks, handle_mm_fault() may return non-fatal error flags. In this case the code should reset the fault flags before returning. Fixes: e06f47a16573 ("s390/mm: try VMA lock-based page fault handling first") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>