summaryrefslogtreecommitdiff
path: root/arch/s390/Kconfig
AgeCommit message (Collapse)Author
2024-03-19Merge tag 's390-6.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Various virtual vs physical address usage fixes - Add new bitwise types and helper functions and use them in s390 specific drivers and code to make it easier to find virtual vs physical address usage bugs. Right now virtual and physical addresses are identical for s390, except for module, vmalloc, and similar areas. This will be changed, hopefully with the next merge window, so that e.g. the kernel image and modules will be located close to each other, allowing for direct branches and also for some other simplifications. As a prerequisite this requires to fix all misuses of virtual and physical addresses. As it turned out people are so used to the concept that virtual and physical addresses are the same, that new bugs got added to code which was already fixed. In order to avoid that even more code gets merged which adds such bugs add and use new bitwise types, so that sparse can be used to find such usage bugs. Most likely the new types can go away again after some time - Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation - Fix kprobe branch handling: if an out-of-line single stepped relative branch instruction has a target address within a certain address area in the entry code, the program check handler may incorrectly execute cleanup code as if KVM code was executed, leading to crashes - Fix reference counting of zcrypt card objects - Various other small fixes and cleanups * tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390/entry: compare gmap asce to determine guest/host fault s390/entry: remove OUTSIDE macro s390/entry: add CIF_SIE flag and remove sie64a() address check s390/cio: use while (i--) pattern to clean up s390/raw3270: make class3270 constant s390/raw3270: improve raw3270_init() readability s390/tape: make tape_class constant s390/vmlogrdr: make vmlogrdr_class constant s390/vmur: make vmur_class constant s390/zcrypt: make zcrypt_class constant s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support s390/vfio_ccw_cp: use new address translation helpers s390/iucv: use new address translation helpers s390/ctcm: use new address translation helpers s390/lcs: use new address translation helpers s390/qeth: use new address translation helpers s390/zfcp: use new address translation helpers s390/tape: fix virtual vs physical address confusion s390/3270: use new address translation helpers s390/3215: use new address translation helpers ...
2024-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-13s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL supportHeiko Carstens
Provide a very simple ARCH_HAS_DEBUG_VIRTUAL implementation. For now errors are only reported for the following cases: - Trying to translate a vmalloc or module address to a physical address - Translating a supposed to be ZONE_DMA virtual address into a physical address, and the resulting physical address is larger than two GiB Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-12Merge tag 'hardening-v6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ...
2024-03-12Merge tag 'asm-generic-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Just two small updates this time: - A series I did to unify the definition of PAGE_SIZE through Kconfig, intended to help with a vdso rework that needs the constant but cannot include the normal kernel headers when building the compat VDSO on arm64 and potentially others - a patch from Yan Zhao to remove the pfn_to_virt() definitions from a couple of architectures after finding they were both incorrect and entirely unused" * tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: define CONFIG_PAGE_SIZE_*KB on all architectures arch: simplify architecture specific page size configuration arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc
2024-03-06arch: define CONFIG_PAGE_SIZE_*KB on all architecturesArnd Bergmann
Most architectures only support a single hardcoded page size. In order to ensure that each one of these sets the corresponding Kconfig symbols, change over the PAGE_SHIFT definition to the common one and allow only the hardware page size to be selected. Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Stafford Horne <shorne@gmail.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-02-21s390: enable MHP_MEMMAP_ON_MEMORYSumanth Korikkar
Enable MHP_MEMMAP_ON_MEMORY to support "memmap on memory". memory_hotplug.memmap_on_memory=true kernel parameter should be set in kernel boot option to enable the feature. Link: https://lkml.kernel.org/r/20240108132747.3238763-6-sumanthk@linux.ibm.com Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-20s390: compile relocatable kernel without -fPIEJosh Poimboeuf
On s390, currently kernel uses the '-fPIE' compiler flag for compiling vmlinux. This has a few problems: - It uses dynamic symbols (.dynsym), for which the linker refuses to allow more than 64k sections. This can break features which use '-ffunction-sections' and '-fdata-sections', including kpatch-build [1] and Function Granular KASLR. - It unnecessarily uses GOT relocations, adding an extra layer of indirection for many memory accesses. Instead of using '-fPIE', resolve all the relocations at link time and then manually adjust any absolute relocations (R_390_64) during boot. This is done by first telling the linker to preserve all relocations during the vmlinux link. (Note this is harmless: they are later stripped in the vmlinux.bin link.) Then use the 'relocs' tool to find all absolute relocations (R_390_64) which apply to allocatable sections. The offsets of those relocations are saved in a special section which is then used to adjust the relocations during boot. (Note: For some reason, Clang occasionally creates a GOT reference, even without '-fPIE'. So Clang-compiled kernels have a GOT, which needs to be adjusted.) On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%. [1] https://github.com/dynup/kpatch/issues/1284 [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html [3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html Compiler consideration: Gcc recently implemented an optimization [2] for loading symbols without explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates symbols to reside on a 2-byte boundary, enabling the use of the larl instruction. However, kernel linker scripts may still generate unaligned symbols. To address this, a new -munaligned-symbols option has been introduced [3] in recent gcc versions. This option has to be used with future gcc versions. Older Clang lacks support for handling unaligned symbols generated by kernel linker scripts when the kernel is built without -fPIE. However, future versions of Clang will include support for the -munaligned-symbols option. When the support is unavailable, compile the kernel with -fPIE to maintain the existing behavior. In addition to it: move vmlinux.relocs to safe relocation When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire uncompressed vmlinux.bin is positioned in the bzImage decompressor image at the default kernel LMA of 0x100000, enabling it to be executed in-place. However, the size of .vmlinux.relocs could be large enough to cause an overlap with the uncompressed kernel at the address 0x100000. To address this issue, .vmlinux.relocs is positioned after the .rodata.compressed in the bzImage. Nevertheless, in this configuration, vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To overcome that, move vmlinux.relocs to a safe location before clearing .bss and handling relocs. Compile warning fix from Sumanth Korikkar: When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are several warnings: ld: warning: orphan section `.rela.iplt' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' ld: warning: orphan section `.rela.head.text' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' ld: warning: orphan section `.rela.init.text' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' ld: warning: orphan section `.rela.rodata.cst8' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' Orphan sections are sections that exist in an object file but don't have a corresponding output section in the final executable. ld raises a warning when it identifies such sections. Eliminate the warning by placing all .rela orphan sections in .rela.dyn and raise an error when size of .rela.dyn is greater than zero. i.e. Dont just neglect orphan sections. This is similar to adjustment performed in x86, where kernel is built with -fno-PIE. commit 5354e84598f2 ("x86/build: Add asserts for unwanted sections") [sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move vmlinux.relocs to safe location] [hca@linux.ibm.com: merged compile warning fix from Sumanth] Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390: don't allow CONFIG_COMPAT with LD=ld.lldNathan Chancellor
When building 'ARCH=s390 defconfig compat.config' with GCC and LD=ld.lld, there is an error when attempting to link the compat vDSO: ld.lld: error: unknown emulation: elf_s390 make[4]: *** [arch/s390/kernel/vdso32/Makefile:48: arch/s390/kernel/vdso32/vdso32.so.dbg] Error 1 Much like clang, ld.lld only supports the 64-bit s390 emulation. Add a dependency on not using LLD to CONFIG_COMPAT to avoid breaking the build with this toolchain combination. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240214-s390-compat-lld-dep-v1-1-abf1f4b5e514@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14s390: select CONFIG_ARCH_WANT_LD_ORPHAN_WARNNathan Chancellor
Now that all sections have been properly accounted for in the s390 linker scripts, select CONFIG_ARCH_WANT_LD_ORPHAN_WARN so that '--orphan-handling' is added to LDFLAGS to catch any future sections that are added without being described in linker scripts. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-10-8a665b3346ab@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-08treewide: remove CONFIG_HAVE_KVMPaolo Bonzini
It has no users anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-06ubsan: Remove CONFIG_UBSAN_SANITIZE_ALLKees Cook
For simplicity in splitting out UBSan options into separate rules, remove CONFIG_UBSAN_SANITIZE_ALL, effectively defaulting to "y", which is how it is generally used anyway. (There are no ":= y" cases beyond where a specific file is enabled when a top-level ":= n" is in effect.) Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: linux-doc@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-01-11s390/kexec: do not automatically select KEXEC optionAlexander Gordeev
Following commit dccf78d39f10 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") also drop automatic KEXEC selection for s390 while set CONFIG_KEXEC=y explicitly for defconfig and debug_defconfig targets. zfcpdump_defconfig target gets CONFIG_KEXEC unset as result, which is right and consistent with CONFIG_KEXEC_FILE besides. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-11s390/compat: change default for CONFIG_COMPAT to "n"Heiko Carstens
31 bit support has been removed from the kernel more than eight years ago. The last 31 bit distribution is many years older. There shouldn't be any 31 bit code around anymore. Therefore avoid providing an unused and only partially tested user space interface and change the default for CONFIG_COMPAT from "yes" to "no". Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-10Merge tag 's390-6.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Add machine variable capacity information to /proc/sysinfo. - Limit the waste of page tables and always align vmalloc area size and base address on segment boundary. - Fix a memory leak when an attempt to register interruption sub class (ISC) for the adjunct-processor (AP) guest failed. - Reset response code AP_RESPONSE_INVALID_GISA to understandable by guest AP_RESPONSE_INVALID_ADDRESS in response to a failed interruption sub class (ISC) registration attempt. - Improve reaction to adjunct-processor (AP) AP_RESPONSE_OTHERWISE_CHANGED response code when enabling interrupts on behalf of a guest. - Fix incorrect sysfs 'status' attribute of adjunct-processor (AP) queue device bound to the vfio_ap device driver when the mediated device is attached to a guest, but the queue device is not passed through. - Rework struct ap_card to hold the whole adjunct-processor (AP) card hardware information. As result, all the ugly bit checks are replaced by simple evaluations of the required bit fields. - Improve handling of some weird scenarios between service element (SE) host and SE guest with adjunct-processor (AP) pass-through support. - Change local_ctl_set_bit() and local_ctl_clear_bit() so they return the previous value of the to be changed control register. This is useful if a bit is only changed temporarily and the previous content needs to be restored. - The kernel starts with machine checks disabled and is expected to enable it once trap_init() is called. However the implementation allows machine checks early. Consistently enable it in trap_init() only. - local_mcck_disable() and local_mcck_enable() assume that machine checks are always enabled. Instead implement and use local_mcck_save() and local_mcck_restore() to disable machine checks and restore the previous state. - Modification of floating point control (FPC) register of a traced process using ptrace interface may lead to corruption of the FPC register of the tracing process. Fix this. - kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point control (FPC) register in vCPU, but may lead to corruption of the FPC register of the host process. Fix this. - Use READ_ONCE() to read a vCPU floating point register value from the memory mapped area. This avoids that, depending on code generation, a different value is tested for validity than the one that is used. - Get rid of test_fp_ctl(), since it is quite subtle to use it correctly. Instead copy a new floating point control register value into its save area and test the validity of the new value when loading it. - Remove superfluous save_fpu_regs() call. - Remove s390 support for ARCH_WANTS_DYNAMIC_TASK_STRUCT. All machines provide the vector facility since many years and the need to make the task structure size dependent on the vector facility does not exist. - Remove the "novx" kernel command line option, as the vector code runs without any problems since many years. - Add the vector facility to the z13 architecture level set (ALS). All hypervisors support the vector facility since many years. This allows compile time optimizations of the kernel. - Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx(). As result, the compiled code will have less runtime checks and less code. - Convert pgste_get_lock() and pgste_set_unlock() ASM inlines to C. - Convert the struct subchannel spinlock from pointer to member. * tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (24 commits) Revert "s390: update defconfigs" s390/cio: make sch->lock spinlock pointer a member s390: update defconfigs s390/mm: convert pgste locking functions to C s390/fpu: get rid of MACHINE_HAS_VX s390/als: add vector facility to z13 architecture level set s390/fpu: remove "novx" option s390/fpu: remove ARCH_WANTS_DYNAMIC_TASK_STRUCT support KVM: s390: remove superfluous save_fpu_regs() call s390/fpu: get rid of test_fp_ctl() KVM: s390: use READ_ONCE() to read fpc register value KVM: s390: fix setting of fpc register s390/ptrace: handle setting of fpc register correctly s390/nmi: implement and use local_mcck_save() / local_mcck_restore() s390/nmi: consistently enable machine checks in trap_init() s390/ctlreg: return old register contents when changing bits s390/ap: handle outband SE bind state change s390/ap: store TAPQ hwinfo in struct ap_card s390/vfio-ap: fix sysfs status attribute for AP queue devices s390/vfio-ap: improve reaction to response code 07 from PQAP(AQIC) command ...
2024-01-09Merge tag 'slab-for-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - SLUB: delayed freezing of CPU partial slabs (Chengming Zhou) Freezing is an operation involving double_cmpxchg() that makes a slab exclusive for a particular CPU. Chengming noticed that we use it also in situations where we are not yet installing the slab as the CPU slab, because freezing also indicates that the slab is not on the shared list. This results in redundant freeze/unfreeze operation and can be avoided by marking separately the shared list presence by reusing the PG_workingset flag. This approach neatly avoids the issues described in 9b1ea29bc0d7 ("Revert "mm, slub: consider rest of partial list if acquire_slab() fails"") as we can now grab a slab from the shared list in a quick and guaranteed way without the cmpxchg_double() operation that amplifies the lock contention and can fail. As a result, lkp has reported 34.2% improvement of stress-ng.rawudp.ops_per_sec - SLAB removal and SLUB cleanups (Vlastimil Babka) The SLAB allocator has been deprecated since 6.5 and nobody has objected so far. We agreed at LSF/MM to wait until the next LTS, which is 6.6, so we should be good to go now. This doesn't yet erase all traces of SLAB outside of mm/ so some dead code, comments or documentation remain, and will be cleaned up gradually (some series are already in the works). Removing the choice of allocators has already allowed to simplify and optimize the code wiring up the kmalloc APIs to the SLUB implementation. * tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits) mm/slub: free KFENCE objects in slab_free_hook() mm/slub: handle bulk and single object freeing separately mm/slub: introduce __kmem_cache_free_bulk() without free hooks mm/slub: fix bulk alloc and free stats mm/slub: optimize free fast path code layout mm/slub: optimize alloc fastpath code layout mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers mm/slab: move kmalloc() functions from slab_common.c to slub.c mm/slab: move kmalloc_slab() to mm/slab.h mm/slab: move kfree() from slab_common.c to slub.c mm/slab: move struct kmem_cache_node from slab.h to slub.c mm/slab: move memcg related functions from slab.h to slub.c mm/slab: move pre/post-alloc hooks from slab.h to slub.c mm/slab: consolidate includes in the internal mm/slab.h mm/slab: move the rest of slub_def.h to mm/slab.h mm/slab: move struct kmem_cache_cpu declaration to slub.c mm/slab: remove mm/slab.c and slab_def.h mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs mm/slab: remove CONFIG_SLAB code from slab common code cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks ...
2023-12-20kexec: fix KEXEC_FILE dependenciesArnd Bergmann
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the 'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which causes a link failure when all the crypto support is in a loadable module and kexec_file support is built-in: x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load': (.text+0x32e30a): undefined reference to `crypto_alloc_shash' x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update' x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final' Both s390 and x86 have this problem, while ppc64 and riscv have the correct dependency already. On riscv, the dependency is only used for the purgatory, not for the kexec_file code itself, which may be a bit surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable KEXEC_FILE but then the purgatory code is silently left out. Move this into the common Kconfig.kexec file in a way that is correct everywhere, using the dependency on CRYPTO_SHA256=y only when the purgatory code is available. This requires reversing the dependency between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect remains the same, other than making riscv behave like the other ones. On s390, there is an additional dependency on CRYPTO_SHA256_S390, which should technically not be required but gives better performance. Remove this dependency here, noting that it was not present in the initial Kconfig code but was brought in without an explanation in commit 71406883fd357 ("s390/kexec_file: Add kexec_file_load system call"). [arnd@arndb.de: fix riscv build] Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org Fixes: 6af5138083005 ("x86/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com> Tested-by: Eric DeVolder <eric_devolder@yahoo.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Conor Dooley <conor@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11s390/fpu: remove ARCH_WANTS_DYNAMIC_TASK_STRUCT supportHeiko Carstens
s390 selects ARCH_WANTS_DYNAMIC_TASK_STRUCT in order to make the size of the task structure dependent on the availability of the vector facility. This doesn't make sense anymore because since many years all machines provide the vector facility. Therefore simplify the code a bit and remove s390 support for ARCH_WANTS_DYNAMIC_TASK_STRUCT. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-05mm/slab: remove CONFIG_SLAB from all Kconfig and MakefileVlastimil Babka
Remove CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_SLAB_DEPRECATED and everything in Kconfig files and mm/Makefile that depends on those. Since SLUB is the only remaining allocator, remove the allocator choice, make CONFIG_SLUB a "def_bool y" for now and remove all explicit dependencies on SLUB or SLAB as it's now always enabled. Make every option's verbose name and description refer to "the slab allocator" without refering to the specific implementation. Do not rename the CONFIG_ option names yet. Everything under #ifdef CONFIG_SLAB, and mm/slab.c is now dead code, all code under #ifdef CONFIG_SLUB is now always compiled. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-11-05s390: add USER_STACKTRACE supportHeiko Carstens
Use the perf_callchain_user() code as blueprint to also add support for USER_STACKTRACE. To describe how to use this cite the commit message of the LoongArch implementation which came with commit 4d7bf939df08 ("LoongArch: Add USER_STACKTRACE support"), but replace -fno-omit-frame-pointer option with the s390 specific -mbackchain option: ====================================================================== To get the best stacktrace output, you can compile your userspace programs with frame pointers (at least glibc + the app you are tracing). 1, export "CC = gcc -mbackchain"; 2, compile your programs with "CC"; 3, use uprobe to get stacktrace output. ... echo 'p:malloc /usr/lib64/libc.so.6:0x0a4704 size=%r2:u64' > uprobe_events echo 'p:free /usr/lib64/libc.so.6:0x0a4d50 ptr=%r2:u64' >> uprobe_events echo 'comm == "demo"' > ./events/uprobes/malloc/filter echo 'comm == "demo"' > ./events/uprobes/free/filter echo 1 > ./options/userstacktrace echo 1 > ./options/sym-userobj ... ====================================================================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16s390: add support for DCACHE_WORD_ACCESSHeiko Carstens
Implement load_unaligned_zeropad() and enable DCACHE_WORD_ACCESS to speed up string operations in fs/dcache.c and fs/namei.c. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-29Merge tag 'mm-stable-2023-08-28-18-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2023-08-18remove ARCH_DEFAULT_KEXEC from Kconfig.kexecEric DeVolder
This patch is a minor cleanup to the series "refactor Kconfig to consolidate KEXEC and CRASH options". In that series, a new option ARCH_DEFAULT_KEXEC was introduced in order to obtain the equivalent behavior of s390 original Kconfig settings for KEXEC. As it turns out, this new option did not fully provide the equivalent behavior, rather a "select KEXEC" did. As such, the ARCH_DEFAULT_KEXEC is not needed anymore, so remove it. Link: https://lkml.kernel.org/r/20230802161750.2215-1-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18kexec: rename ARCH_HAS_KEXEC_PURGATORYEric DeVolder
The Kconfig refactor to consolidate KEXEC and CRASH options utilized option names of the form ARCH_SUPPORTS_<option>. Thus rename the ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow the same. Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18s390/kexec: refactor for kernel/Kconfig.kexecEric DeVolder
The kexec and crash kernel options are provided in the common kernel/Kconfig.kexec. Utilize the common options and provide the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the equivalent set of KEXEC and CRASH options. Link: https://lkml.kernel.org/r/20230712161545.87870-13-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm/vmemmap optimization: split hugetlb and devdax vmemmap optimizationAneesh Kumar K.V
Arm disabled hugetlb vmemmap optimization [1] because hugetlb vmemmap optimization includes an update of both the permissions (writeable to read-only) and the output address (pfn) of the vmemmap ptes. That is not supported without unmapping of pte(marking it invalid) by some architectures. With DAX vmemmap optimization we don't require such pte updates and architectures can enable DAX vmemmap optimization while having hugetlb vmemmap optimization disabled. Hence split DAX optimization support into a different config. s390, loongarch and riscv don't have devdax support. So the DAX config is not enabled for them. With this change, arm64 should be able to select DAX optimization [1] commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") Link: https://lkml.kernel.org/r/20230724190759.483013-8-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18s390: mm: convert to GENERIC_IOREMAPBaoquan He
By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and iounmap() are all visible and available to arch. Arch needs to provide wrapper functions to override the generic versions if there's arch specific handling in its ioremap_prot(), ioremap() or iounmap(). This change will simplify implementation by removing duplicated code with generic_ioremap_prot() and generic_iounmap(), and has the equivalent functioality as before. Here, add wrapper functions ioremap_prot() and iounmap() for s390's special operation when ioremap() and iounmap(). And also replace including <asm-generic/io.h> with <asm/io.h> in arch/s390/kernel/perf_cpum_sf.c, otherwise building error will be seen because macro defined in <asm/io.h> can't be seen in perf_cpum_sf.c. Link: https://lkml.kernel.org/r/20230706154520.11257-11-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-01s390/cert_store: select CRYPTO_LIB_SHA256Sven Schnelle
A build failure was reported when sha256() is not present: gcc-13.1.0-nolibc/s390-linux/bin/s390-linux-ld: arch/s390/kernel/cert_store.o: in function `check_certificate_hash': arch/s390/kernel/cert_store.c:267: undefined reference to `sha256' Therefore make CONFIG_CERT_STORE select CRYPTO_LIB_SHA256. Fixes: 8cf57d7217c3 ("s390: add support for user-defined certificates") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/8ecb57fb-4560-bdfc-9e55-63e3b0937132@infradead.org/ Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230728100430.1567328-1-svens@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24docs: move s390 under archCosta Shulyupin
and fix all in-tree references. Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24s390/ftrace: enable HAVE_FUNCTION_GRAPH_RETVALSven Schnelle
Add support for tracing return values in the function graph tracer. This requires return_to_handler() to record gpr2 and the frame pointer Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24s390/hypfs: factor out filesystem codeHeiko Carstens
The s390_hypfs filesystem is deprecated and shouldn't be used due to its rather odd semantics. It creates a whole directory structure with static file contents so a user can read a consistent state while within that directory. Writing to its update attribute will remove and rebuild nearly the whole filesystem, so that again a user can read a consistent state, even if multiple files need to be read. Given that this wastes a lot of CPU cycles, and involves a lot of code, binary interfaces have been added quite a couple of years ago, which simply pass the binary data to user space, and let user space decode the data. This is the preferred and only way how the data should be retrieved. The assumption is that there are no users of the s390_hypfs filesystem. However instead of just removing the code, and having to revert in case there are actually users, factor the filesystem code out and make it only available via a new config option. This config option is supposed to be disabled. If it turns out there are no complaints the filesystem code can be removed probably in a couple of years. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24s390: add support for user-defined certificatesAnastasia Eskova
Enable receiving the user-defined certificates from the s390x hypervisor via new diagnose 0x320 calls, and make them available to the Linux root user as 'cert_store_key' type keys in a so-called 'cert_store' keyring. New user-space interfaces: /sys/firmware/cert_store/refresh Writing to this attribute re-fetches certificates via DIAG 0x320 /sys/firmware/cert_store/cs_status Reading from this attribute returns either of: "uninitialized" If no certificate has been retrieved yet "ok" If certificates have been successfully retrieved "failed (<number>)" If certificate retrieval failed with reason code <number> New debug trace areas: /sys/kernel/debug/s390dbf/cert_store_msg /sys/kernel/debug/s390dbf/cert_store_hexdump Usage example: To initiate request for certificates available to the system as root: $ echo 1 > /sys/firmware/cert_store/refresh Upon success the '/sys/firmware/cert_store/cs_status' contains the value 'ok'. $ cat /sys/firmware/cert_store/cs_status ok Get the ID of the keyring 'cert_store': $ keyctl search @us keyring cert_store OR $ keyctl link @us @s; keyctl request keyring cert_store Obtain list of IDs of certificates: $ keyctl rlist <cert_store keyring ID> Display certificate content as hex-dump: $ keyctl read <certificate ID> Read certificate contents as binary data: $ keyctl pipe <certificate ID> >cert_data Display certificate description: $ keyctl describe <certificate ID> The certificate description has the following format: <64 bytes certificate name in EBCDIC> ':' <certificate index as obtained from hypervisor> ':' <certificate store token obtained from hypervisor> The certificate description in /proc/keys has certificate name represented in ASCII. Users can read but cannot update the content of the certificate. Signed-off-by: Anastasia Eskova <anastasia.eskova@ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-11mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()Rick Edgecombe
The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable or shadow stack mappings. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(), create a new arch config HAS_HUGE_PAGE that can be used to tell if pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the compiler would not be able to find pmd_mkwrite_novma(). No functional change. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
2023-06-27Merge tag 's390-6.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Fix the style of protected key API driver source: use x-mas tree for all local variable declarations - Rework protected key API driver to not use the struct pkey_protkey and pkey_clrkey anymore. Both structures have a fixed size buffer, but with the support of ECC protected key these buffers are not big enough. Use dynamic buffers internally and transparently for userspace - Add support for a new 'non CCA clear key token' with ECC clear keys supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448. This makes it possible to derive a protected key from the ECC clear key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way to derive is via PCKMO instruction - The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t for reference counting. Replace this with the proper data type refcount_t - Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since gcc generates inefficient code, which may lead to stack overflows - Replace one-element array with flexible-array member in struct vfio_ccw_parent and refactor the rest of the code accordingly. Also, prefer struct_size() over sizeof() open- coded versions - Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the system memory should be cleared or not once dumped - Fix a hang when a user attempts to remove a VFIO-AP mediated device attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback to request a release of the device - Fix calculation for R_390_GOTENT relocations for modules - Allow any user space process with CAP_PERFMON capability read and display the CPU Measurement facility counter sets - Rework large statically-defined per-CPU cpu_cf_events data structure and replace it with dynamically allocated structures created when a perf_event_open() system call is invoked or /dev/hwctr device is accessed * tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process s390/module: fix rela calculation for R_390_GOTENT s390/vfio-ap: wire in the vfio_device_ops request callback s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl s390/pkey: add support for ecc clear key s390/pkey: do not use struct pkey_protkey s390/pkey: introduce reverse x-mas trees s390/zcore: conditionally clear memory on reipl s390/ipl: add REIPL_CLEAR flag to os_info vfio/ccw: use struct_size() helper vfio/ccw: replace one-element array with flexible-array member s390: select ARCH_SUPPORTS_INT128 s390/pai_ext: replace atomic_t with refcount_t s390/pai_crypto: replace atomic_t with refcount_t
2023-05-17s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMUJason Gunthorpe
These don't do anything anymore, the only user of the symbol was VFIO_CCW/AP which already "depends on VFIO" and VFIO itself selects IOMMU_API. When this was added VFIO was wrongly doing "depends on IOMMU_API" which required some contortions like this to ensure IOMMU_API was turned on. Reviewed-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-eb322ce2e547+188f-rm_iommu_ccw_jgg@nvidia.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17s390/Kconfig: remove obsolete configs SCHED_{BOOK,DRAWER}Lukas Bulwahn
Commit f1045056c726 ("topology/sysfs: rework book and drawer topology ifdefery") activates the book and drawer topology, previously activated by CONFIG_SCHED_{BOOK,DRAWER}, dependent on the existence of certain macro definitions. Hence, since then, CONFIG_SCHED_{BOOK,DRAWER} have no effect and any further purpose. Remove the obsolete configs SCHED_{BOOK,DRAWER}. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20230508040916.16733-1-lukas.bulwahn@gmail.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-15s390: select ARCH_SUPPORTS_INT128Heiko Carstens
s390 has instructions to support 128 bit arithmetics, e.g. a 64 bit multiply instruction with a 128 bit result. Also 128 bit integer artithmetics are already used in s390 specific architecture code (see e.g. read_persistent_clock64()). Therefore select ARCH_SUPPORTS_INT128. However limit this to clang for now, since gcc generates inefficient code, which may lead to stack overflows, when compiling lib/crypto/curve25519-hacl64.c which depends on ARCH_SUPPORTS_INT128. The gcc generated functions have 6kb stack frames, compared to only 1kb of the code generated with clang. If the kernel is compiled with -Os library calls for __ashlti3(), __ashrti3(), and __lshrti3() may be generated. Similar to arm64 and riscv provide assembler implementations for these functions. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-05s390: remove the unneeded select GCC12_NO_ARRAY_BOUNDSLukas Bulwahn
Commit 0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") makes config GCC11_NO_ARRAY_BOUNDS to be for disabling -Warray-bounds in any gcc version 11 and upwards, and with that, removes the GCC12_NO_ARRAY_BOUNDS config as it is now covered by the semantics of GCC11_NO_ARRAY_BOUNDS. As GCC11_NO_ARRAY_BOUNDS is yes by default, there is no need for the s390 architecture to explicitly select GCC11_NO_ARRAY_BOUNDS. Hence, the select GCC12_NO_ARRAY_BOUNDS in arch/s390/Kconfig can simply be dropped. Remove the unneeded "select GCC12_NO_ARRAY_BOUNDS". Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-30Merge tag 's390-6.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for stackleak feature. Also allow specifying architecture-specific stackleak poison function to enable faster implementation. On s390, the mvc-based implementation helps decrease typical overhead from a factor of 3 to just 25% - Convert all assembler files to use SYM* style macros, deprecating the ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS - Improve KASLR to also randomize module and special amode31 code base load addresses - Rework decompressor memory tracking to support memory holes and improve error handling - Add support for protected virtualization AP binding - Add support for set_direct_map() calls - Implement set_memory_rox() and noexec module_alloc() - Remove obsolete overriding of mem*() functions for KASAN - Rework kexec/kdump to avoid using nodat_stack to call purgatory - Convert the rest of the s390 code to use flexible-array member instead of a zero-length array - Clean up uaccess inline asm - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable DEBUG_FORCE_FUNCTION_ALIGN_64B - Resolve last_break in userspace fault reports - Simplify one-level sysctl registration - Clean up branch prediction handling - Rework CPU counter facility to retrieve available counter sets just once - Other various small fixes and improvements all over the code * tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits) s390/stackleak: provide fast __stackleak_poison() implementation stackleak: allow to specify arch specific stackleak poison function s390: select ARCH_USE_SYM_ANNOTATIONS s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc() s390: wire up memfd_secret system call s390/mm: enable ARCH_HAS_SET_DIRECT_MAP s390/mm: use BIT macro to generate SET_MEMORY bit masks s390/relocate_kernel: adjust indentation s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc. s390/entry: use SYM* macros instead of ENTRY(), etc. s390/purgatory: use SYM* macros instead of ENTRY(), etc. s390/kprobes: use SYM* macros instead of ENTRY(), etc. s390/reipl: use SYM* macros instead of ENTRY(), etc. s390/head64: use SYM* macros instead of ENTRY(), etc. s390/earlypgm: use SYM* macros instead of ENTRY(), etc. s390/mcount: use SYM* macros instead of ENTRY(), etc. s390/crc32le: use SYM* macros instead of ENTRY(), etc. s390/crc32be: use SYM* macros instead of ENTRY(), etc. s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc. s390/amode31: use SYM* macros instead of ENTRY(), etc. ...
2023-04-20s390: select ARCH_USE_SYM_ANNOTATIONSHeiko Carstens
All old style assembly annotations have been converted for s390. Select ARCH_USE_SYM_ANNOTATIONS to make sure the old macros like ENTRY() aren't available anymore. This prevents that new code which uses the old macros will be added again. This follows what has been done for x86 with commit 2ce0d7f9766f ("x86/asm: Provide a Kconfig symbol for disabling old assembly annotations") and for arm64 with commit 50479d58eaa3 ("arm64: Disable old style assembly annotations"). Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20s390/mm: enable ARCH_HAS_SET_DIRECT_MAPHeiko Carstens
Implement the set_direct_map_*() API, which allows to invalidate and set default permissions to pages within the direct mapping. Note that kernel_page_present(), which is also supposed to be part of this API, is intentionally not implemented. The reason for this is that kernel_page_present() is only used (and currently only makes sense) for suspend/resume, which isn't supported on s390. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-18mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAPAneesh Kumar K.V
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-13s390/checksum: always use cksm instructionHeiko Carstens
Commit dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN") switched s390 to use the generic checksum functions, so that KASAN instrumentation also works checksum functions by avoiding architecture specific inline assemblies. There is however the problem that the generic csum_partial() function returns a 32 bit value with a 16 bit folded checksum, while the original s390 variant does not fold to 16 bit. This in turn causes that the ipib_checksum in lowcore contains different values depending on kernel config options. The ipib_checksum is used by system dumpers to verify if pointers in lowcore point to valid data. Verification is done by comparing checksum values. The system dumpers still use 32 bit checksum values which are not folded, and therefore the checksum verification fails (incorrectly). Symptom is that reboot after dump does not work anymore when a KASAN instrumented kernel is dumped. Fix this by not using the generic checksum implementation. Instead add an explicit kasan_check_read() so that KASAN knows about the read access from within the inline assembly. Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Fixes: dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN") Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-05s390/mm: try VMA lock-based page fault handling firstHeiko Carstens
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. This is the s390 variant of "x86/mm: try VMA lock-based page fault handling first". Link: https://lkml.kernel.org/r/20230314132808.1266335-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-04s390: enable HAVE_ARCH_STACKLEAKHeiko Carstens
Add support for the stackleak feature. Whenever the kernel returns to user space the kernel stack is filled with a poison value. Enabling this feature is quite expensive: e.g. after instrumenting the getpid() system call function to have a 4kb stack the result is an increased runtime of the system call by a factor of 3. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27s390: enable ARCH_HAS_MEMBARRIER_SYNC_COREHeiko Carstens
s390 trivially supports the ARCH_HAS_MEMBARRIER_SYNC_CORE requirements since the used lpswe(y) instruction to return from any kernel context to user space performs CPU serialization. This is very similar to arm, arm64 and powerpc. See commit 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE") for further details. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-20s390: make use of CONFIG_FUNCTION_ALIGNMENTHeiko Carstens
Make use of CONFIG_FUNCTION_ALIGNMENT which was introduced with commit d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT"). Select FUNCTION_ALIGNMENT_8B for gcc in order to reflect gcc's default function alignment. For all other compilers, which is only clang, select a function alignment of 16 bytes which reflects the default function alignment for clang. Also change the __ALIGN define to follow whatever the value of CONFIG_FUNCTION_ALIGNMENT is. This makes sure that the alignment of C and assembler functions is the same. In result everything still uses the default function alignment for both compilers. However in addition this is now also true for all assembly functions, so that all functions have a consistent alignment. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>