summaryrefslogtreecommitdiff
path: root/arch/s390/include
AgeCommit message (Collapse)Author
2023-02-14s390/mem_detect: do not truncate online memory ranges infoVasily Gorbik
Commit bf64f0517e5d ("s390/mem_detect: handle online memory limit just once") introduced truncation of mem_detect online ranges based on identity mapping size. For kdump case however the full set of online memory ranges has to be feed into memblock_physmem_add so that crashed system memory could be extracted. Instead of truncating introduce a "usable limit" which is respected by mem_detect api. Also add extra online memory ranges iterator which still provides full set of online memory ranges disregarding the "usable limit". Fixes: bf64f0517e5d ("s390/mem_detect: handle online memory limit just once") Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-14s390/vx: remove __uint128_t type from __vector128 struct againHeiko Carstens
The __uint128_t member was only added for future convenience to the __vector128 struct. However this is a uapi header file, 31/32 bit (aka compat layer) is still supported, but doesn't know anything about this type: /usr/include/asm/types.h:27:17: error: unknown type name __uint128_t 27 | __uint128_t v; Therefore remove it again. Fixes: b0b7b43fcc46 ("s390/vx: add 64 and 128 bit members to __vector128 struct") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-14s390/mm: add support for RDP (Reset DAT-Protection)Gerald Schaefer
RDP instruction allows to reset DAT-protection bit in a PTE, with less CPU synchronization overhead than IPTE instruction. In particular, IPTE can cause machine-wide synchronization overhead, and excessive IPTE usage can negatively impact machine performance. RDP can be used instead of IPTE, if the new PTE only differs in SW bits and _PAGE_PROTECT HW bit, for PTE protection changes from RO to RW. SW PTE bit changes are allowed, e.g. for dirty and young tracking, but none of the other HW-defined part of the PTE must change. This is because the architecture forbids such changes to an active and valid PTE, which is why invalidation with IPTE is always used first, before writing a new entry. The RDP optimization helps mainly for fault-driven SW dirty-bit tracking. Writable PTEs are initially always mapped with HW _PAGE_PROTECT bit set, to allow SW dirty-bit accounting on first write protection fault, where the DAT-protection would then be reset. The reset is now done with RDP instead of IPTE, if RDP instruction is available. RDP cannot always guarantee that the DAT-protection reset is propagated to all CPUs immediately. This means that spurious TLB protection faults on other CPUs can now occur. For this, common code provides a flush_tlb_fix_spurious_fault() handler, which will now be used to do a CPU-local TLB flush. However, this will clear the whole TLB of a CPU, and not just the affected entry. For more fine-grained flushing, by simply doing a (local) RDP again, flush_tlb_fix_spurious_fault() would need to also provide the PTE pointer. Note that spurious TLB protection faults cannot really be distinguished from racing pagetable updates, where another thread already installed the correct PTE. In such a case, the local TLB flush would be unnecessary overhead, but overall reduction of CPU synchronization overhead by not using IPTE is still expected to be beneficial. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-10s390/ap: fix status returned by ap_qact()Halil Pasic
Since commit 159491f3b509 ("s390/ap: rework assembler functions to use unions for in/out register variables") the function ap_qact() tries to grab the status from the wrong part of the register. Thus we always end up with zeros. Which is wrong, among others, because we detect failures via status.response_code. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Harald Freudenberger <freude@linux.ibm.com> Fixes: 159491f3b509 ("s390/ap: rework assembler functions to use unions for in/out register variables") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-10s390/ap: fix status returned by ap_aqic()Halil Pasic
There function ap_aqic() tries to grab the status from the wrong part of the register. Thus we always end up with zeros. Which is wrong, among others, because we detect failures via status.response_code. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Fixes: 159491f3b509 ("s390/ap: rework assembler functions to use unions for in/out register variables") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09Revert "s390/mem_detect: do not update output parameters on failure"Heiko Carstens
This reverts commit cbc29f107e51b1cc7d1e7b0bbe0691a1224205f1. Get rid of the following smatch warnings: arch/s390/include/asm/mem_detect.h:86 get_mem_detect_end() error: uninitialized symbol 'end'. arch/s390/include/asm/mem_detect.h:86 get_mem_detect_end() error: uninitialized symbol 'end'. arch/s390/boot/vmem.c:256 setup_vmem() error: uninitialized symbol 'start'. arch/s390/boot/vmem.c:258 setup_vmem() error: uninitialized symbol 'end'. Note that there is no bug in the code. This is purely to silence smatch. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/idle: remove arch_cpu_idle_time() and corresponding codeHeiko Carstens
arch_cpu_idle_time() returns the idle time of any given cpu if it is in idle, or zero if not. All if this is racy and partially incorrect. Time stamps taken with store clock extended and store clock fast from different cpus are compared, while the architecture states that this is nothing which can be relied on (see Principles of Operation; Chapter 4, "Setting and Inspecting the Clock"). A more fundamental problem is that the timestamp when a cpu is leaving idle is taken early in the assembler part of the interrupt handler, and this value is only transferred many cycles later to the cpu's per-cpu idle data structure. This per cpu data structure is read by arch_cpu_idle() to tell for which period of time a remote cpu is idle: if only an idle_enter value is present, the assumed idle time of the cpu is calculated by taking a local timestamp and returning the difference of the local timestamp and the idle_enter value. This is potentially incorrect, since the remote cpu may have already left idle, but the taken timestamp may not have been transferred to the per-cpu data structure. This in turn means that too much idle time may be reported for a cpu, and a subsequent calculation of system idle time may result in a smaller value. Instead of coming up with even more complex code trying to fix this, just remove this code, and only account idle time of a cpu, after idle state is left. Another minor bug is that it is assumed that timestamps are non-zero, which is not necessarily the case for timestamps taken with store clock fast. This however is just a very minor problem, since this can only happen when the epoch increases. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/vx: use simple assignments to access __vector128 membersHeiko Carstens
Use simple assignments to access __vector128 members instead of hard to read casts. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/vx: add 64 and 128 bit members to __vector128 structHeiko Carstens
Add 64 and 128 bit members to __vector128 struct in order to allow reading of the complete value, or the higher or lower part of vector register contents instead of having to use casts. Add an explicit __aligned(4) statement to avoid that the alignment of the structure changes from 4 to 8. This should make sure that no breakage happens because of this change. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/processor: always inline cpu flag helper functionsHeiko Carstens
arch_cpu_idle() is marked noinstr and therefore must only call functions which are also not instrumented. Make sure that cpu flag helper functions are always inlined to avoid that the compiler generates an out-of-line function for e.g. the call within arch_cpu_idle(). Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09Merge branch 'cmpxchg_user_key' into featuresHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-07Merge remote-tracking branch 'l390-korg/cmpxchg_user_key' into kvm-nextJanosch Frank
2023-02-06s390/boot: avoid page tables memory in kaslrVasily Gorbik
If kernel is build without KASAN support there is a chance that kernel image is going to be positioned by KASLR code to overlap with identity mapping page tables. When kernel is build with KASAN support enabled memory which is potentially going to be used for page tables and KASAN shadow mapping is accounted for in KASLR with the use of kasan_estimate_memory_needs(). Split this function and introduce vmem_estimate_memory_needs() to cover decompressor's vmem identity mapping page tables. Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-06s390/mem_detect: add get_mem_detect_online_total()Vasily Gorbik
Add a function to get online memory in total. It is supposed to be used in the decompressor as well as during early kernel startup. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-06s390/rethook: add local rethook header fileHeiko Carstens
Compiling the kernel with CONFIG_KPROBES disabled, but CONFIG_RETHOOK enabled, results in this sparse warning: arch/s390/kernel/rethook.c:26:15: warning: no previous prototype for 'arch_rethook_trampoline_callback' [-Wmissing-prototypes] 26 | unsigned long arch_rethook_trampoline_callback(struct pt_regs *regs) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Add a local rethook header file similar to riscv to address this. Reported-by: kernel test robot <lkp@intel.com> Fixes: 1a280f48c0e4 ("s390/kprobes: replace kretprobe with rethook") Link: https://lore.kernel.org/all/202302030102.69dZIuJk-lkp@intel.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-02mm: add vma_alloc_zeroed_movable_folio()Matthew Wilcox (Oracle)
Replace alloc_zeroed_user_highpage_movable(). The main difference is returning a folio containing a single page instead of returning the page, but take the opportunity to rename the function to match other allocation functions a little better and rewrite the documentation to place more emphasis on the zeroing rather than the highmem aspect. Link: https://lkml.kernel.org/r/20230116191813.2145215-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand
__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that support swp PTEs, so let's drop it. Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31s390/mem_detect: do not update output parameters on failureAlexander Gordeev
Function __get_mem_detect_block() resets start and end output parameters in case of invalid mem_detect array index is provided. That violates the rule of sparing the output on fail path and leads e.g to a below anomaly: for_each_mem_detect_block(i, &start, &end) continue; One would expect start and end contain addresses of the last memory block (if available), but in fact the two will be reset to zeroes. That is not how an iterator is expected to work. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-31s390/cio: introduce locking for register/unregister functionsVineeth Vijayan
Unbinding an I/O subchannel with a child-CCW device in disconnected state sometimes causes a kernel-panic. The race condition was seen mostly during testing, when setting all the CHPIDs of a device to offline and at the same time, the unbinding the I/O subchannel driver. The kernel-panic occurs because of double delete, the I/O subchannel driver calls device_del on the CCW device while another device_del invocation for the same device is in-flight. For instance, disabling all the CHPIDs will trigger the ccw_device_remove function, which will call a ccw_device_unregister(), which ends up calling the device_del() which is asynchronous via cdev's todo workqueue. And unbinding the I/O subchannel driver calls io_subchannel_remove() function which calls the ccw_device_unregister() and device_del(). This double delete can be prevented by serializing all CCW device registration/unregistration calls into the driver core. This patch introduces a mutex which will be used for this purpose. Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-30Merge branch 'iommu-memory-accounting' of ↵Jason Gunthorpe
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu intoiommufd/for-next Jason Gunthorpe says: ==================== iommufd follows the same design as KVM and uses memory cgroups to limit the amount of kernel memory a iommufd file descriptor can pin down. The various internal data structures already use GFP_KERNEL_ACCOUNT to charge its own memory. However, one of the biggest consumers of kernel memory is the IOPTEs stored under the iommu_domain and these allocations are not tracked. This series is the first step in fixing it. The iommu driver contract already includes a 'gfp' argument to the map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then having the driver allocate the IOPTE tables with that flag will capture a significant amount of the allocations. Update the iommu_map() API to pass in the GFP argument, and fix all call sites. Replace iommu_map_atomic(). Audit the "enterprise" iommu drivers to make sure they do the right thing. Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are already correct. A follow up series will be needed to capture the allocations made when the iommu_domain itself is allocated, which will complete the job. ==================== * 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/s390: Use GFP_KERNEL in sleepable contexts iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'s iommu/intel: Use GFP_KERNEL in sleepable contexts iommu/intel: Support the gfp argument to the map_pages op iommu/intel: Add a gfp parameter to alloc_pgtable_page() iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map() iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous() iommu: Add a gfp parameter to iommu_map_sg() iommu: Remove iommu_map_atomic() iommu: Add a gfp parameter to iommu_map() Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-25s390/syscalls: get rid of system call alias functionsHeiko Carstens
bpftrace and friends only consider functions present in /sys/kernel/tracing/available_filter_functions. For system calls there is the s390 specific problem that the system call function itself is present via __se_sys##name() while the system call itself is wired up via an __s390x_sys##name() alias. The required DWARF debug information however is only available for the original function, not the alias, but within available_filter_functions only the functions with __s390x_ prefix are available. Which means the required DWARF debug information cannot be found. While this could be solved via tooling, it is easier to change the s390 specific system call wrapper handling. Therefore get rid of this alias handling and implement system call wrappers like most other architectures are doing. In result the implementation generates the following functions: long __s390x_sys##name(struct pt_regs *regs) static inline long __se_sys##name(...) static inline long __do_sys##name(...) __s390x_sys##name() is the visible system call function which is also wired up in the system call table. Its only parameter is a pt_regs variable. This function calls the corresponding __se_sys##name() function, which has as many parameters like the system call definition. This function in turn performs all zero and sign extensions of all system call parameters, taken from the pt_regs structure, and finally calls __do_sys##name(). __do_sys##name() is the actual inlined system call function implementation. For all 64 bit system calls there is a 31/32 bit system call function __s390_sys##name() generated, which handles all system call parameters correctly as required by compat handling. This function may be wired up within the compat system call table, unless there exists an explicit compat system call function, which is then used instead. Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/syscalls: remove trailing semicolonHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/syscalls: move __S390_SYS_STUBx() macroHeiko Carstens
Move __S390_SYS_STUBx() the end of the CONFIG_COMPAT section, so both variants (compat and non-compat) are close together and can be easily compared. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/syscalls: remove __SC_COMPAT_TYPE defineHeiko Carstens
Remove __SC_COMPAT_TYPE define which is an unused leftover. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/syscalls: remove SYSCALL_METADATA() from compat syscallsHeiko Carstens
SYSCALL_METADATA() is only supposed to be used for non-compat system calls. Otherwise there would be a name clash. This also removes the inconsistency that s390 is the only architecture which uses SYSCALL_METADATA() for compat system calls, and even that only for compat system calls without parameters. Only two such compat system calls exist. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/cpum_cf: merge source files for CPU Measurement counter facilityThomas Richter
With no in-kernel user, the source files can be merged. Move all functions and the variable definitions to file perf_cpum_cf.c This file now contains all the necessary functions and definitions for the CPU Measurement counter facility device driver. The files cpu_mcf.h and perf_cpum_cf_common.c are deleted. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/cpum_cf: remove in-kernel counting facility interfaceThomas Richter
Commit 17bebcc68eee ("s390/cpum_cf: Add minimal in-kernel interface for counter measurements") introduced a small in-kernel interface for CPU Measurement counter facility. There are no users of this interface, therefore remove it. The following functions are removed: kernel_cpumcf_alert(), kernel_cpumcf_begin(), kernel_cpumcf_end(), kernel_cpumcf_avail() there is no need for them anymore. With the removal of function kernel_cpumcf_alert(), also remove member alert in struct cpu_cf_events. Its purpose was to counter measurement alert interrupts for the in-kernel interface. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/cpum_cf: move stccm_avail()Thomas Richter
Function stccm_avail() is defined in a header file and the only user is one single source file. Move this function to the source file where it is also used and remove it from the header file. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25s390/cpum_cf: move cpum_cf_ctrset_size()Thomas Richter
Function cpum_cf_ctrset_size() is defined in one source file and the only user is in another source file. Move this function to the source file where it is used and remove its prototype from the header file. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-25iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'sJason Gunthorpe
dma_alloc_cpu_table() and dma_alloc_page_table() are eventually called by iommufd through s390_iommu_map_pages() and it should not be forced to atomic. Thread the gfp parameter through the call chain starting from s390_iommu_map_pages(). Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-22s390/kprobes: replace kretprobe with rethookVasily Gorbik
That's an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") to s390. Replaces the kretprobe code with rethook on s390. With this patch, kretprobe on s390 uses the rethook instead of kretprobe specific trampoline code. Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/cpum_sf: move functions from header file to source fileThomas Richter
Some inline helper functions are defined in a header file but used in only one source file. Move these functions to the source file. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-17Merge branch 'fixes' into featuresHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-13s390: move __amode31_base declaration to proper header fileHeiko Carstens
Move __amode31_base declaration to proper header file to get rid of arch/s390/boot/startup.c:24:15: warning: symbol '__amode31_base' was not declared. Should it be static? Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-13s390/mm: allocate Absolute Lowcore Area in decompressorAlexander Gordeev
Move Absolute Lowcore Area allocation to the decompressor. As result, get_abs_lowcore() and put_abs_lowcore() access brackets become really straight and do not require complex execution context analysis and LAP and interrupts tackling. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-13s390/mm: allocate Real Memory Copy Area in decompressorAlexander Gordeev
Move Real Memory Copy Area allocation to the decompressor. As result, memcpy_real() and memcpy_real_iter() movers become usable since the very moment the kernel starts. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-13s390/mm: start kernel with DAT enabledAlexander Gordeev
The setup of the kernel virtual address space is spread throughout the sources, boot stages and config options like this: 1. The available physical memory regions are queried and stored as mem_detect information for later use in the decompressor. 2. Based on the physical memory availability the virtual memory layout is established in the decompressor; 3. If CONFIG_KASAN is disabled the kernel paging setup code populates kernel pgtables and turns DAT mode on. It uses the information stored at step [1]. 4. If CONFIG_KASAN is enabled the kernel early boot kasan setup populates kernel pgtables and turns DAT mode on. It uses the information stored at step [1]. The kasan setup creates early_pg_dir directory and directly overwrites swapper_pg_dir entries to make shadow memory pages available. Move the kernel virtual memory setup to the decompressor and start the kernel with DAT turned on right from the very first istruction. That completely eliminates the boot phase when the kernel runs in DAT-off mode, simplies the overall design and consolidates pgtables setup. The identity mapping is created in the decompressor, while kasan shadow mappings are still created by the early boot kernel code. Share with decompressor the existing kasan memory allocator. It decreases the size of a newly requested memory block from pgalloc_pos and ensures that kernel image is not overwritten. pgalloc_low and pgalloc_pos pointers are made preserved boot variables for that. Use the bootdata infrastructure to setup swapper_pg_dir and invalid_pg_dir directories used by the kernel later. The interim early_pg_dir directory established by the kasan initialization code gets eliminated as result. As the kernel runs in DAT-on mode only the PSW_KERNEL_BITS define gets PSW_MASK_DAT bit by default. Additionally, the setup_lowcore_dat_off() and setup_lowcore_dat_on() routines get merged, since there is no DAT-off mode stage anymore. The memory mappings are created with RW+X protection that allows the early boot code setting up all necessary data and services for the kernel being booted. Just before the paging is enabled the memory protection is changed to RO+X for text, RO+NX for read-only data and RW+NX for kernel data and the identity mapping. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-13s390/pgtable: add REGION3_KERNEL_EXEC protectionAlexander Gordeev
Similar to existing PAGE_KERNEL_EXEC and SEGMENT_KERNEL_EXEC memory protection add REGION3_KERNEL_EXEC attribute that could be set on PUD pgtable entries. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11irq/s390: Add arch_is_isolated_msi() for s390Jason Gunthorpe
s390 doesn't use irq_domains, so it has no place to set IRQ_DOMAIN_FLAG_ISOLATED_MSI. Instead of continuing to abuse the iommu subsystem to convey this information add a simple define which s390 can make statically true. The define will cause msi_device_has_isolated() to return true. Remove IOMMU_CAP_INTR_REMAP from the s390 iommu driver. Link: https://lore.kernel.org/r/8-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()Heiko Carstens
Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only dereferenced once by using READ_ONCE(). Otherwise the compiler could generate incorrect code. Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11s390/cpum_sf: add READ_ONCE() semantics to compare and swap loopsHeiko Carstens
The current cmpxchg_double() loops within the perf hw sampling code do not have READ_ONCE() semantics to read the old value from memory. This allows the compiler to generate code which reads the "old" value several times from memory, which again allows for inconsistencies. For example: /* Reset trailer (using compare-double-and-swap) */ do { te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK; te_flags |= SDB_TE_ALERT_REQ_MASK; } while (!cmpxchg_double(&te->flags, &te->overflow, te->flags, te->overflow, te_flags, 0ULL)); The compiler could generate code where te->flags used within the cmpxchg_double() call may be refetched from memory and which is not necessarily identical to the previous read version which was used to generate te_flags. Which in turn means that an incorrect update could happen. Fix this by adding READ_ONCE() semantics to all cmpxchg_double() loops. Given that READ_ONCE() cannot generate code on s390 which atomically reads 16 bytes, use a private compare-and-swap-double implementation to achieve that. Also replace cmpxchg_double() with the private implementation to be able to re-use the old value within the loops. As a side effect this converts the whole code to only use bit fields to read and modify bits within the hws trailer header. Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333 Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09vfio/ccw: calculate number of IDAWs regardless of formatEric Farman
The idal_nr_words() routine works well for 4K IDAWs, but lost its ability to handle the old 2K formats with the removal of 31-bit builds in commit 5a79859ae0f3 ("s390: remove 31 bit support"). Since there's nothing preventing a guest from generating this IDAW format, let's re-introduce the math for them and use both when calculating the number of IDAWs based on the bits specified in the ORB. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09s390/diag: use __packed __alignedSven Schnelle
Use __packed __aligned instead of __attribute__((packed, aligned(X))); to match the rest of the file. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09s390/fs3270: split header filesSven Schnelle
In order to use the fs3270 one would need at least the ioctl definitions in uapi. Add two new include files in uapi, which contain: fs3270: ioctl number declarations + returned struct for TUBGETMOD. raw3270: all the orders, attributes and similar stuff used with 3270 terminals. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09s390/tty3270: add support for diag 8cSven Schnelle
The current code uses diag210 to infer the 3270 geometry from the model number when running on z/VM. This doesn't work well as almost all 3270 software clients report as 3279-2 with a custom resolution. tty3270 assumes it has a 80x24 terminal connected because of the -2 suffix. Use diag 8c to fetch the realy geometry from z/VM. Note that this doesn't allow dynamic resizing, i.e. reconnecting to a z/VM session with a different geometry. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09s390: remove the last remnants of cputime_tNicholas Piggin
cputime_t was a core kernel type, removed by commits ed5c8c854f2b..b672592f0221. As explained in commit b672592f0221 ("sched/cputime: Remove generic asm headers"), the final cleanup is for the arch to provide cputime_to_nsec[s](). Commit e53051e757d6 ("s390/cputime: provide archicture specific cputime_to_nsecs") did that, but just didn't remove the then-unused cputime_to_usecs() and associated remnants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20221006105635.115775-1-npiggin@gmail.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-06s390/debug: add _ASM_S390_ prefix to header guardNiklas Schnelle
Using DEBUG_H without a prefix is very generic and inconsistent with other header guards in arch/s390/include/asm. In fact it collides with the same name in the ath9k wireless driver though that depends on !S390 via disabled wireless support. Let's just use a consistent header guard name and prevent possible future trouble. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-04s390/uaccess: avoid __ashlti3() callHeiko Carstens
__cmpxchg_user_key() uses 128 bit types which, depending on compiler and config options, may lead to an __ashlti3() library call. Get rid of that by simply casting the 128 bit values to 32 bit values. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Fixes: 51098f0eb22e ("s390/cmpxchg: make loop condition for 1,2 byte cases precise") Link: https://lore.kernel.org/all/4b96b112d5415d08a81d30657feec2c8c3000f7c.camel@linux.ibm.com/ Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-12-29KVM: Opt out of generic hardware enabling on s390 and PPCSean Christopherson
Allow architectures to opt out of the generic hardware enabling logic, and opt out on both s390 and PPC, which don't need to manually enable virtualization as it's always on (when available). In addition to letting s390 and PPC drop a bit of dead code, this will hopefully also allow ARM to clean up its related code, e.g. ARM has its own per-CPU flag to track which CPUs have enable hardware due to the need to keep hardware enabled indefinitely when pKVM is enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-50-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-19Merge tag 'iommu-updates-v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core code: - map/unmap_pages() cleanup - SVA and IOPF refactoring - Clean up and document return codes from device/domain attachment AMD driver: - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and ivrs_acpihid command line options - Some smaller cleanups Intel driver: - Blocking domain support - Cleanups S390 driver: - Fixes and improvements for attach and aperture handling PAMU driver: - Resource leak fix and cleanup Rockchip driver: - Page table permission bit fix Mediatek driver: - Improve safety from invalid dts input - Smaller fixes and improvements Exynos driver: - Fix driver initialization sequence Sun50i driver: - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever - Various other fixes" * tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits) iommu/mediatek: Fix forever loop in error handling iommu/mediatek: Fix crash on isr after kexec() iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY iommu/amd: Fix typo in macro parameter name iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data iommu/mediatek: Improve safety for mediatek,smi property in larb nodes iommu/mediatek: Validate number of phandles associated with "mediatek,larbs" iommu/mediatek: Add error path for loop of mm_dts_parse iommu/mediatek: Use component_match_add iommu/mediatek: Add platform_device_put for recovering the device refcnt iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() iommu/vt-d: Use real field for indication of first level iommu/vt-d: Remove unnecessary domain_context_mapped() iommu/vt-d: Rename domain_add_dev_info() iommu/vt-d: Rename iommu_disable_dev_iotlb() iommu/vt-d: Add blocking domain support iommu/vt-d: Add device_block_translation() helper iommu/vt-d: Allocate pasid table in device probe path iommu/amd: Check return value of mmu_notifier_register() iommu/amd: Fix pci device refcount leak in ppr_notifier() ...