summaryrefslogtreecommitdiff
path: root/arch/s390/include
AgeCommit message (Collapse)Author
2023-06-05arch: Remove cmpxchg_doublePeter Zijlstra
No moar users, remove the monster. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
2023-06-05percpu: Wire up cmpxchg128Peter Zijlstra
In order to replace cmpxchg_double() with the newly minted cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.654945124@infradead.org
2023-06-05arch: Introduce arch_{,try_}_cmpxchg128{,_local}()Peter Zijlstra
For all architectures that currently support cmpxchg_double() implement the cmpxchg128() family of functions that is basically the same but with a saner interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.452120708@infradead.org
2023-06-01s390/pkey: add support for ecc clear keyHarald Freudenberger
Add support for a new 'non CCA clear key token' with these ECC clear keys supported: - ECC P256 - ECC P384 - ECC P521 - ECC ED25519 - ECC ED448 This makes it possible to derive a protected key from this ECC clear key input via PKEY_KBLOB2PROTK3 ioctl. As of now the only way to derive protected keys from these clear key tokens is via PCKMO instruction. For AES keys an alternate path via creating a secure key from the clear key and then derive a protected key from the secure key exists. This alternate path is not implemented for ECC keys as it would require to rearrange and maybe recalculate the clear key material for input to derive an CCA or EP11 ECC secure key. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-01s390/pkey: do not use struct pkey_protkeyHarald Freudenberger
This is an internal rework of the pkey code to not use the struct pkey_protkey internal any more. This struct has a hard coded protected key buffer with MAXPROTKEYSIZE = 64 bytes. However, with support for ECC protected key, this limit is too short and thus this patch reworks all the internal code to use the triple u8 *protkey, u32 protkeylen, u32 protkeytype instead. So the ioctl which still has to deal with this struct coming from userspace and/or provided to userspace invoke all the internal functions now with the triple instead of passing a pointer to struct pkey_protkey. Also the struct pkey_clrkey has been internally replaced in a similar way. This struct also has a hard coded clear key buffer of MAXCLRKEYSIZE = 32 bytes and thus is not usable with e.g. ECC clear key material. This is a transparent rework for userspace applications using the pkey API. The internal kernel API used by the PAES crypto ciphers has been adapted to this change to make it possible to provide ECC protected keys via this interface in the future. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-01s390/ipl: add REIPL_CLEAR flag to os_infoMikhail Zaslonko
Introduce new OS_INFO_FLAGS_ENTRY to os_info pointing to the field with bit flags. Add OS_INFO_FLAGS_ENTRY upon dump_reipl shutdown action processing and set OS_INFO_FLAG_REIPL_CLEAR flag indicating 'clear' sysfs attribute has been set on the panicked system for specified ipl type. This flag can be used to inform the dumper whether LOAD_CLEAR or LOAD_NORMAL diag308 subcode to be used for ipl after dumping the memory. Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17s390/uapi: cover statfs padding by growing f_spareIlya Leoshkevich
pahole says: struct compat_statfs64 { ... u32 f_spare[4]; /* 68 16 */ /* size: 88, cachelines: 1, members: 12 */ /* padding: 4 */ struct statfs { ... unsigned int f_spare[4]; /* 68 16 */ /* size: 88, cachelines: 1, members: 12 */ /* padding: 4 */ struct statfs64 { ... unsigned int f_spare[4]; /* 68 16 */ /* size: 88, cachelines: 1, members: 12 */ /* padding: 4 */ One has to keep the existence of padding in mind when working with these structs. Grow f_spare arrays to 5 in order to simplify things. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230504144021.808932-3-iii@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17procfs: consolidate arch_report_meminfo declarationArnd Bergmann
The arch_report_meminfo() function is provided by four architectures, with a __weak fallback in procfs itself. On architectures that don't have a custom version, the __weak version causes a warning because of the missing prototype. Remove the architecture specific prototypes and instead add one in linux/proc_fs.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for arch/x86 Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Message-Id: <20230516195834.551901-1-arnd@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-15s390: select ARCH_SUPPORTS_INT128Heiko Carstens
s390 has instructions to support 128 bit arithmetics, e.g. a 64 bit multiply instruction with a 128 bit result. Also 128 bit integer artithmetics are already used in s390 specific architecture code (see e.g. read_persistent_clock64()). Therefore select ARCH_SUPPORTS_INT128. However limit this to clang for now, since gcc generates inefficient code, which may lead to stack overflows, when compiling lib/crypto/curve25519-hacl64.c which depends on ARCH_SUPPORTS_INT128. The gcc generated functions have 6kb stack frames, compared to only 1kb of the code generated with clang. If the kernel is compiled with -Os library calls for __ashlti3(), __ashrti3(), and __lshrti3() may be generated. Similar to arm64 and riscv provide assembler implementations for these functions. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-05Merge tag 'locking-core-2023-05-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-04-30Merge tag 's390-6.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for stackleak feature. Also allow specifying architecture-specific stackleak poison function to enable faster implementation. On s390, the mvc-based implementation helps decrease typical overhead from a factor of 3 to just 25% - Convert all assembler files to use SYM* style macros, deprecating the ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS - Improve KASLR to also randomize module and special amode31 code base load addresses - Rework decompressor memory tracking to support memory holes and improve error handling - Add support for protected virtualization AP binding - Add support for set_direct_map() calls - Implement set_memory_rox() and noexec module_alloc() - Remove obsolete overriding of mem*() functions for KASAN - Rework kexec/kdump to avoid using nodat_stack to call purgatory - Convert the rest of the s390 code to use flexible-array member instead of a zero-length array - Clean up uaccess inline asm - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable DEBUG_FORCE_FUNCTION_ALIGN_64B - Resolve last_break in userspace fault reports - Simplify one-level sysctl registration - Clean up branch prediction handling - Rework CPU counter facility to retrieve available counter sets just once - Other various small fixes and improvements all over the code * tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits) s390/stackleak: provide fast __stackleak_poison() implementation stackleak: allow to specify arch specific stackleak poison function s390: select ARCH_USE_SYM_ANNOTATIONS s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc() s390: wire up memfd_secret system call s390/mm: enable ARCH_HAS_SET_DIRECT_MAP s390/mm: use BIT macro to generate SET_MEMORY bit masks s390/relocate_kernel: adjust indentation s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc. s390/entry: use SYM* macros instead of ENTRY(), etc. s390/purgatory: use SYM* macros instead of ENTRY(), etc. s390/kprobes: use SYM* macros instead of ENTRY(), etc. s390/reipl: use SYM* macros instead of ENTRY(), etc. s390/head64: use SYM* macros instead of ENTRY(), etc. s390/earlypgm: use SYM* macros instead of ENTRY(), etc. s390/mcount: use SYM* macros instead of ENTRY(), etc. s390/crc32le: use SYM* macros instead of ENTRY(), etc. s390/crc32be: use SYM* macros instead of ENTRY(), etc. s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc. s390/amode31: use SYM* macros instead of ENTRY(), etc. ...
2023-04-29locking/arch: Rename all internal __xchg() names to __arch_xchg()Andrzej Hajda
Decrease the probability of this internal facility to be used by driver code. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv] Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-27Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-20s390/stackleak: provide fast __stackleak_poison() implementationHeiko Carstens
Provide an s390 specific __stackleak_poison() implementation which is faster than the generic variant. For the original implementation with an enforced 4kb stackframe for the getpid() system call the system call overhead increases by a factor of 3 if the stackleak feature is enabled. Using the s390 mvc based variant this is reduced to an increase of 25% instead. This is within the expected area, since the mvc based implementation is more or less a memset64() variant which comes with similar results. See commit 0b77d6701cf8 ("s390: implement memset16, memset32 & memset64"). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20230405130841.1350565-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20s390/mm: enable ARCH_HAS_SET_DIRECT_MAPHeiko Carstens
Implement the set_direct_map_*() API, which allows to invalidate and set default permissions to pages within the direct mapping. Note that kernel_page_present(), which is also supposed to be part of this API, is intentionally not implemented. The reason for this is that kernel_page_present() is only used (and currently only makes sense) for suspend/resume, which isn't supported on s390. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20s390/mm: use BIT macro to generate SET_MEMORY bit masksHeiko Carstens
Use BIT macro to generate SET_MEMORY bit masks, which is easier to maintain if bits get added, or removed. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19s390/kasan: remove override of mem*() functionsHeiko Carstens
The kasan mem*() functions are not used anymore since s390 has switched to GENERIC_ENTRY and commit 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions"). Therefore remove the now dead code, similar to x86. While at it also use the SYM* macros in mem.S. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19s390/kdump: remove nodat stack restriction for calling nodat functionsAlexander Gordeev
To allow calling of DAT-off code from kernel the stack needs to be switched to nodat_stack (or other stack mapped as 1:1). Before call_nodat() macro was introduced that was necessary to provide the very same memory address for STNSM and STOSM instructions. If the kernel would stay on a random stack (e.g. a virtually mapped one) then a virtual address provided for STNSM instruction could differ from the physical address needed for the corresponding STOSM instruction. After call_nodat() macro is introduced the kernel stack does not need to be mapped 1:1 anymore, since the macro stores the physical memory address of return PSW in a register before entering DAT-off mode. This way the return LPSWE instruction is able to pick the correct memory location and restore the DAT-on mode. That however might fail in case the 16-byte return PSW happened to cross page boundary: PSW mask and PSW address could end up in two separate non-contiguous physical pages. Align the return PSW on 16-byte boundary so it always fits into a single physical page. As result any stack (including the virtually mapped one) could be used for calling DAT-off code and prior switching to nodat_stack becomes unnecessary. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19s390/kdump: rework invocation of DAT-off codeAlexander Gordeev
Calling kdump kernel is a two-step process that involves invocation of the purgatory code: first time - to verify the new kernel checksum and second time - to call the new kernel itself. The purgatory code operates on real addresses and does not expect any memory protection. Therefore, before the purgatory code is entered the DAT mode is always turned off. However, it is only restored upon return from the new kernel checksum verification. In case the purgatory was called to start the new kernel and failed the control is returned to the old kernel, but the DAT mode continues staying off. The new kernel start failure is unlikely and leads to the disabled wait state anyway. Still that poses a risk, since the kernel code in general is not DAT-off safe and even calling the disabled_wait() function might crash. Introduce call_nodat() macro that allows entering DAT-off mode, calling an arbitrary function and restoring DAT mode back on. Switch all invocations of DAT-off code to that macro and avoid the above described scenario altogether. Name the call_nodat() macro in small letters after the already existing call_on_stack() and put it to the same header file. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> [hca@linux.ibm.com: some small modifications to call_nodat() macro] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/fcx: replace zero-length array with flexible-array memberGustavo A. R. Silva
Zero-length arrays are deprecated [1] and have to be replaced by C99 flexible-array members. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help to make progress towards globally enabling -fstrict-flex-arrays=3 [2] Link: https://github.com/KSPP/linux/issues/78 [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZC7XT5prvoE4Yunm@work Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/diag: replace zero-length array with flexible-array memberGustavo A. R. Silva
Zero-length arrays are deprecated [1] and have to be replaced by C99 flexible-array members. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help to make progress towards globally enabling -fstrict-flex-arrays=3 [2] Link: https://github.com/KSPP/linux/issues/78 [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZC7XGpUtVhqlRLhH@work Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/mm: fix direct map accountingHeiko Carstens
Commit bb1520d581a3 ("s390/mm: start kernel with DAT enabled") did not implement direct map accounting in the early page table setup code. In result the reported values are bogus now: $cat /proc/meminfo ... DirectMap4k: 5120 kB DirectMap1M: 18446744073709546496 kB DirectMap2G: 0 kB Fix this by adding the missing accounting. The result looks sane again: $cat /proc/meminfo ... DirectMap4k: 6156 kB DirectMap1M: 2091008 kB DirectMap2G: 6291456 kB Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/mm: implement set_memory_rwnx()Heiko Carstens
Given that set_memory_rox() is implemented, provide also set_memory_rwnx(). This allows to get rid of all open coded __set_memory() usages in s390 architecture code. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/mm: implement set_memory_rox()Heiko Carstens
Provide the s390 specific native set_memory_rox() implementation to avoid frequent set_memory_ro(); set_memory_x() call pairs. This is the s390 variant of commit 60463628c9e0 ("x86/mm: Implement native set_memory_rox()"). Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/kaslr: provide kaslr_enabled() functionHeiko Carstens
Just like other architectures provide a kaslr_enabled() function, instead of directly accessing a global variable. Also pass the renamed __kaslr_enabled variable from the decompressor to the kernel, so that kalsr_enabled() is available there too. This will be used by a subsequent patch which randomizes the module base load address. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/checksum: remove not needed uaccess.h includeHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/checksum: always use cksm instructionHeiko Carstens
Commit dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN") switched s390 to use the generic checksum functions, so that KASAN instrumentation also works checksum functions by avoiding architecture specific inline assemblies. There is however the problem that the generic csum_partial() function returns a 32 bit value with a 16 bit folded checksum, while the original s390 variant does not fold to 16 bit. This in turn causes that the ipib_checksum in lowcore contains different values depending on kernel config options. The ipib_checksum is used by system dumpers to verify if pointers in lowcore point to valid data. Verification is done by comparing checksum values. The system dumpers still use 32 bit checksum values which are not folded, and therefore the checksum verification fails (incorrectly). Symptom is that reboot after dump does not work anymore when a KASAN instrumented kernel is dumped. Fix this by not using the generic checksum implementation. Instead add an explicit kasan_check_read() so that KASAN knows about the read access from within the inline assembly. Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Fixes: dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN") Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-11s390/dasd: add autoquiesce featureStefan Haberland
Add the internal logic to check for autoquiesce triggers and handle them. Quiesce and resume are functions that tell Linux to stop/resume issuing I/Os to a specific DASD. The DASD driver allows a manual quiesce/resume via ioctl. Autoquiesce will define an amount of triggers that will lead to an automatic quiesce if a certain event occurs. There is no automatic resume. All events will be reported via DASD Extended Error Reporting (EER) if configured. Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Link: https://lore.kernel.org/r/20230405142017.2446986-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04s390: move on_thread_stack() to processor.hHeiko Carstens
As preparation for the stackleak feature move on_thread_stack() to processor.h like x86. Also make it __always_inline, and slightly optimize it by reading current task's kernel stack pointer from lowcore. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04s390/stacktrace: remove call_on_stack_noreturn()Heiko Carstens
There is no user left of call_on_stack_noreturn() - remove it. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04s390/stack: use STACK_INIT_OFFSET where possibleHeiko Carstens
Make STACK_INIT_OFFSET also available for assembler code, and use it everywhere instead of open-coding it at several places. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-28mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()Gerald Schaefer
s390 can do more fine-grained handling of spurious TLB protection faults, when there also is the PTE pointer available. Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an additional parameter. This will add no functional change to other architectures, but those with private flush_tlb_fix_spurious_fault() implementations need to be made aware of the new parameter. Link: https://lkml.kernel.org/r/20230306161548.661740-1-gerald.schaefer@linux.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-27s390/cpum_sf: remove flag PERF_CPUM_SF_FULL_BLOCKSThomas Richter
This flag is used to process only fully populated sampling buffers when an sampling event is stopped on a CPU. By default the last sampling buffer is also scanned for samples even if the sampling block full indicator is not set in the trailer entry of a sampling buffer page. This flag can be set via perf_event_attr::config1 field. It was never used and never documented. It is useless now. With PERF_CPUM_SF_FULL_BLOCKS: When a process is scheduled off the CPU, the sampling is stopped and the samples are copied to the perf ring buffer and marked invalid. When stopped at the last full sample buffer page (which is achieved with the PERF_CPUM_SF_FULL_BLOCKS options), the hardware sampling will resume at the first free sample entry in the current, partially filled sample buffer. Without PERF_CPUM_SF_FULL_BLOCKS (default behavior): The partially filled last sample buffer is scanned and valid samples are saved to the perf ring buffer. The valid samples are marked invalid. The sampling is resumed when the process is scheduled on this CPU. Again the hardware sampling will resume at the first free sample entry in the current, partially filled sample buffer. Now the next interrupt handler invocation scans the full sample block and saves the valid samples to the ring buffer. It omits the invalid samples at the top of the buffer. The default behavior is fully sufficient, therefore remove this feature. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-20s390/ap: implement SE AP bind, unbind and associateHarald Freudenberger
Implementation of the new functions for SE AP support: bind, unbind and associate. There are two new sysfs attributes for this: /sys/devices/ap/cardxx/xx.yyyy/se_bind /sys/devices/ap/cardxx/xx.yyyy/se_associate Writing a 1 into the se_bind attribute triggers the SE AP bind for this AP queue, writing a 0 into does an unbind - that's a reset (RAPQ) with the F bit enabled. The se_associate attribute needs an integer value in range 0...2^16-1 written in. This is the index into a secrets table feed into the ultravisor. For more details please see the Architecture documents. These both new ap queue attributes are only visible inside a SE guest with SB (Secure Binding) available. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/ap: new low level inline functions ap_bapq() and ap_aapq()Harald Freudenberger
Introduce two new low level functions ap_bapq() (calls PQAP(BAPQ)) and ap_aapq (calls PQAP(AAPQ)). Both functions are only meant to be used in SE environment with the SE AP binding facility available. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/ap: provide F bit parameter for ap_rapq() and ap_zapq()Harald Freudenberger
Extent the ap inline functions ap_rapq() (calls PQAP(RAPQ)) and ap_zapq() (calls PQAP(ZAPQ)) with a new parameter to enable the new architectured F bit which forces an unassociate and/or unbind on a secure execution associated and/or bound queue. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/ap: make tapq gr2 response a structHarald Freudenberger
This patch introduces a new struct ap_tapq_gr2 which covers the response in GR2 on TAPQ invocation. This makes it much easier and less error-prone for the calling functions to access the right field without shifting and masking. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/ap: exploit new B bit from QCI config infoHarald Freudenberger
This patch introduces an update to the ap_config_info struct which is filled with the QCI subfunction. There is a new bit apsb (short 'B') showing if the AP secure bind facility is available. The patch also includes a simple function ap_sb_available() wrapping this bit test. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/zcrypt: rework length information for dqapHarald Freudenberger
The inline ap_dqap function does not return the number of bytes actually written into the message buffer. The calling code inspects the AP message header to figure out what kind of AP message has been received and pulls the length information from this header. This processing may not work correctly in cases where only a fragment of the reply is received. With this patch the ap_dqap inline function now returns the number of actually written bytes in the *length parameter. So the calling function has a chance to compare the number of received bytes against what the AP message header length field states. This is especially useful in cases where a message could only get partially received. The low level reply processing functions needed some rework to be able to catch this new length information and compare it the right way. The rework also deals with some situations where until now the reply length was not correctly calculated and/or set. All this has been heavily tested as the modifications on the reply length information may affect crypto load. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/zcrypt: make psmid unsigned long instead of long longHarald Freudenberger
Since s390 kernel build does not support 32 bit build any more there is no difference between long and long long. So this patch reworks all occurrences of psmid (a 64 bit value) to use unsigned long now. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/expoline: use __ALIGN instead of open coded .alignHeiko Carstens
Use __ALIGN instead of open coded .align statement to make sure that external expoline thunks follow global function alignment rules. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390: make use of CONFIG_FUNCTION_ALIGNMENTHeiko Carstens
Make use of CONFIG_FUNCTION_ALIGNMENT which was introduced with commit d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT"). Select FUNCTION_ALIGNMENT_8B for gcc in order to reflect gcc's default function alignment. For all other compilers, which is only clang, select a function alignment of 16 bytes which reflects the default function alignment for clang. Also change the __ALIGN define to follow whatever the value of CONFIG_FUNCTION_ALIGNMENT is. This makes sure that the alignment of C and assembler functions is the same. In result everything still uses the default function alignment for both compilers. However in addition this is now also true for all assembly functions, so that all functions have a consistent alignment. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20Merge branch 'decompressor-memory-tracking' into featuresHeiko Carstens
Vasily Gorbik says: =================== Combine and generalize all methods for finding unused memory in decompressor, while decreasing complexity, add memory holes support, while improving error handling (especially in low-memory conditions) and debug-ability. =================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/kasan: move shadow mapping to decompressorVasily Gorbik
Since regular paging structs are initialized in decompressor already move KASAN shadow mapping to decompressor as well. This helps to avoid allocating KASAN required memory in 1 large chunk, de-duplicate paging structs creation code and start the uncompressed kernel with KASAN instrumentation right away. This also allows to avoid all pitfalls accidentally calling KASAN instrumented code during KASAN initialization. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/boot: rework decompressor reserved trackingVasily Gorbik
Currently several approaches for finding unused memory in decompressor are utilized. While "safe_addr" grows towards higher addresses, vmem code allocates paging structures top down. The former requires careful ordering. In addition to that ipl report handling code verifies potential intersections with secure boot certificates on its own. Neither of two approaches are memory holes aware and consistent with each other in low memory conditions. To solve that, existing approaches are generalized and combined together, as well as online memory ranges are now taken into consideration. physmem_info has been extended to contain reserved memory ranges. New set of functions allow to handle reserves and find unused memory. All reserves and memory allocations are "typed". In case of out of memory condition decompressor fails with detailed info on current reserved ranges and usable online memory. Linux version 6.2.0 ... Kernel command line: ... mem=100M Our of memory allocating 100000 bytes 100000 aligned in range 0:5800000 Reserved memory ranges: 0000000000000000 0000000003e33000 DECOMPRESSOR 0000000003f00000 00000000057648a3 INITRD 00000000063e0000 00000000063e8000 VMEM 00000000063eb000 00000000063f4000 VMEM 00000000063f7800 0000000006400000 VMEM 0000000005800000 0000000006300000 KASAN Usable online memory ranges (info source: sclp read info [3]): 0000000000000000 0000000006400000 Usable online memory total: 6400000 Reserved: 61b10a3 Free: 24ef5d Call Trace: (sp:000000000002bd58 [<0000000000012a70>] physmem_alloc_top_down+0x60/0x14c) sp:000000000002bdc8 [<0000000000013756>] _pa+0x56/0x6a sp:000000000002bdf0 [<0000000000013bcc>] pgtable_populate+0x45c/0x65e sp:000000000002be90 [<00000000000140aa>] setup_vmem+0x2da/0x424 sp:000000000002bec8 [<0000000000011c20>] startup_kernel+0x428/0x8b4 sp:000000000002bf60 [<00000000000100f4>] startup_normal+0xd4/0xd4 physmem_alloc_range allows to find free memory in specified range. It should be used for one time allocations only like finding position for amode31 and vmlinux. physmem_alloc_top_down can be used just like physmem_alloc_range, but it also allows multiple allocations per type and tries to merge sequential allocations together. Which is useful for paging structures allocations. If sequential allocations cannot be merged together they are "chained", allowing easy per type reserved ranges enumeration and migration to memblock later. Extra "struct reserved_range" allocated for chaining are not tracked or reserved but rely on the fact that both physmem_alloc_range and physmem_alloc_top_down search for free memory only below current top down allocator position. All reserved ranges should be transferred to memblock before memblock allocations are enabled. The startup code has been reordered to delay any memory allocations until online memory ranges are detected and occupied memory ranges are marked as reserved to be excluded from follow-up allocations. Ipl report certificates are a special case, ipl report certificates list is checked together with other memory reserves until certificates are saved elsewhere. KASAN required memory for shadow memory allocation and mapping is reserved as 1 large chunk which is later passed to KASAN early initialization code. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/boot: rename mem_detect to physmem_infoVasily Gorbik
In preparation to extending mem_detect with additional information like reserved ranges rename it to more generic physmem_info. This new naming also help to avoid confusion by using more exact terms like "physmem online ranges", etc. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/boot: remove non-functioning image bootable checkVasily Gorbik
check_image_bootable() has been introduced with commit 627c9b62058e ("s390/boot: block uncompressed vmlinux booting attempts") to make sure that users don't try to boot uncompressed vmlinux ELF image in qemu. It used to be possible quite some time ago. That commit prevented confusion with uncompressed vmlinux image starting to boot and even printing kernel messages until it crashed. Users might have tried to report the problem without realizing they are doing something which was not intended. Since commit f1d3c5323772 ("s390/boot: move sclp early buffer from fixed address in asm to C") check_image_bootable() doesn't function properly anymore, as well as booting uncompressed vmlinux image in qemu doesn't really produce any output and crashes. Moving forward it doesn't make sense to fix check_image_bootable() anymore, so simply remove it. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-13s390/setup: always inline gen_lpswe()Heiko Carstens
gen_lpswe() contains a BUILD_BUG_ON() statement which depends on a function parameter. If the compiler decides to generate a not inlined function this will lead to a build error, even if all call sites pass a valid parameter. To avoid this always inline gen_lpswe(). Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13s390/bp: remove __bpon()Heiko Carstens
There is no point in changing branch prediction state of a cpu shortly before it enters stop state. Therefore remove __bpon(). Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13s390/bp: remove s390_isolate_bp_guest()Heiko Carstens
s390_isolate_bp_guest() is unused. Remove it. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>