summaryrefslogtreecommitdiff
path: root/arch/arm64/mm
AgeCommit message (Collapse)Author
2022-09-26mm/swap: add swp_offset_pfn() to fetch PFN from swap entryPeter Xu
We've got a bunch of special swap entries that stores PFN inside the swap offset fields. To fetch the PFN, normally the user just calls swp_offset() assuming that'll be the PFN. Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the max possible length of a PFN on the host, meanwhile doing proper check with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry(). One reason to do so is we never tried to sanitize whether swap offset can really fit for storing PFN. At the meantime, this patch also prepares us with the future possibility to store more information inside the swp offset field, so assuming "swp_offset(entry)" to be the PFN will not stand any more very soon. Replace many of the swp_offset() callers to use swp_offset_pfn() where proper. Note that many of the existing users are not candidates for the replacement, e.g.: (1) When the swap entry is not a pfn swap entry at all, or, (2) when we wanna keep the whole swp_offset but only change the swp type. For the latter, it can happen when fork() triggered on a write-migration swap entry pte, we may want to only change the migration type from write->read but keep the rest, so it's not "fetching PFN" but "changing swap type only". They're left aside so that when there're more information within the swp offset they'll be carried over naturally in those cases. Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what the new swp_offset_pfn() is about. Link: https://lkml.kernel.org/r/20220811161331.37055-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Minchan Kim <minchan@kernel.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26arm64: Add types to indirect called assembly functionsSami Tolvanen
With CONFIG_CFI_CLANG, assembly functions indirectly called from C code must be annotated with type identifiers to pass CFI checking. Use SYM_TYPED_FUNC_START for the indirectly called functions, and ensure we emit `bti c` also with SYM_TYPED_FUNC_START. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-10-samitolvanen@google.com
2022-09-26Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', ↵Joerg Roedel
'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-09-22arm64: mte: move register initialization to CPeter Collingbourne
If FEAT_MTE2 is disabled via the arm64.nomte command line argument on a CPU that claims to support FEAT_MTE2, the kernel will use Tagged Normal in the MAIR. If we interpret arm64.nomte to mean that the CPU does not in fact implement FEAT_MTE2, setting the system register like this may lead to UNSPECIFIED behavior. Fix it by arranging for MAIR to be set in the C function cpu_enable_mte which is called based on the sanitized version of the system register. There is no need for the rest of the MTE-related system register initialization to happen from assembly, with the exception of TCR_EL1, which must be set to include at least TBI1 because the secondary CPUs access KASan-allocated data structures early. Therefore, make the TCR_EL1 initialization unconditional and move the rest of the initialization to cpu_enable_mte so that we no longer have a dependency on the unsanitized ID register value. Co-developed-by: Evgenii Stepanov <eugenis@google.com> Signed-off-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Evgenii Stepanov <eugenis@google.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Fixes: 3b714d24ef17 ("arm64: mte: CPU feature detection and initial sysreg configuration") Cc: <stable@vger.kernel.org> # 5.10.x Link: https://lore.kernel.org/r/20220915222053.3484231-1-eugenis@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()Kefeng Wang
Directly check ARM64_SWAPPER_USES_SECTION_MAPS to choose base page or PMD level huge page mapping in vmemmap_populate() to simplify code a bit. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220920014951.196191-1-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()Will Deacon
arch_dma_prep_coherent() is called when preparing a non-cacheable region for a consistent DMA buffer allocation. Since the buffer pages may previously have been written via a cacheable mapping and consequently allocated as dirty cachelines, the purpose of this function is to remove these dirty lines from the cache, writing them back so that the non-coherent device is able to see them. On arm64, this operation can be achieved with a clean to the point of coherency; a subsequent invalidation is not required and serves little purpose in the presence of a cacheable alias (e.g. the linear map), since clean lines can be speculatively fetched back into the cache after the invalidation operation has completed. Relax the cache maintenance in arch_dma_prep_coherent() so that only a clean, and not a clean-and-invalidate operation is performed. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220823122111.17439-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22arm64: mm: don't acquire mutex when rewriting swapperMark Rutland
Since commit: 47546a1912fc4a03 ("arm64: mm: install KPTI nG mappings with MMU enabled)" ... when building with CONFIG_DEBUG_ATOMIC_SLEEP=y and booting under QEMU TCG with '-cpu max', there's a boot-time splat: | BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 15, name: migration/0 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | no locks held by migration/0/15. | irq event stamp: 28 | hardirqs last enabled at (27): [<ffff8000091ed180>] _raw_spin_unlock_irq+0x3c/0x7c | hardirqs last disabled at (28): [<ffff8000081b8d74>] multi_cpu_stop+0x150/0x18c | softirqs last enabled at (0): [<ffff80000809a314>] copy_process+0x594/0x1964 | softirqs last disabled at (0): [<0000000000000000>] 0x0 | CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.0.0-rc3-00002-g419b42ff7eef #3 | Hardware name: linux,dummy-virt (DT) | Stopper: multi_cpu_stop+0x0/0x18c <- stop_cpus.constprop.0+0xa0/0xfc | Call trace: | dump_backtrace.part.0+0xd0/0xe0 | show_stack+0x1c/0x5c | dump_stack_lvl+0x88/0xb4 | dump_stack+0x1c/0x38 | __might_resched+0x180/0x230 | __might_sleep+0x4c/0xa0 | __mutex_lock+0x5c/0x450 | mutex_lock_nested+0x30/0x40 | create_kpti_ng_temp_pgd+0x4fc/0x6d0 | kpti_install_ng_mappings+0x2b8/0x3b0 | cpu_enable_non_boot_scope_capabilities+0x7c/0xd0 | multi_cpu_stop+0xa0/0x18c | cpu_stopper_thread+0x88/0x11c | smpboot_thread_fn+0x1ec/0x290 | kthread+0x118/0x120 | ret_from_fork+0x10/0x20 Since commit: ee017ee353506fce ("arm64/mm: avoid fixmap race condition when create pud mapping") ... once the kernel leave the SYSTEM_BOOTING state, the fixmap pagetable entries are protected by the fixmap_lock mutex. The new KPTI rewrite code uses __create_pgd_mapping() to create a temporary pagetable. This happens in atomic context, after secondary CPUs are brought up and the kernel has left the SYSTEM_BOOTING state. Hence we try to acquire a mutex in atomic context, which is generally unsound (though benign in this case as the mutex should be free and all other CPUs are quiescent). This patch avoids the issue by pulling the mutex out of alloc_init_pud() and calling it at a higher level in the pagetable manipulation code. This allows it to be used without locking where one CPU is known to be in exclusive control of the machine, even after having left the SYSTEM_BOOTING state. Fixes: 47546a1912fc ("arm64: mm: install KPTI nG mappings with MMU enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220920134731.1625740-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-09arm64/sysreg: Standardise naming for MTE feature enumerationMark Brown
In preparation for conversion to automatic generation refresh the names given to the items in the MTE feture enumeration to reflect our standard pattern for naming, corresponding to the architecture feature names they reflect. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20220905225425.1871461-17-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64/sysreg: Standardise naming of ID_AA64MMFR0_EL1.ASIDBitsMark Brown
For some reason we refer to ID_AA64MMFR0_EL1.ASIDBits as ASID. Add BITS into the name, bringing the naming into sync with DDI0487H.a. Due to the large amount of MixedCase in this register which isn't really consistent with either the kernel style or the majority of the architecture the use of upper case is preserved. No functional changes. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20220905225425.1871461-10-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64/sysreg: Add _EL1 into ID_AA64PFR1_EL1 constant namesMark Brown
Our standard is to include the _EL1 in the constant names for registers but we did not do that for ID_AA64PFR1_EL1, update to do so in preparation for conversion to automatic generation. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20220905225425.1871461-8-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64/sysreg: Add _EL1 into ID_AA64MMFR0_EL1 definition namesMark Brown
Normally we include the full register name in the defines for fields within registers but this has not been followed for ID registers. In preparation for automatic generation of defines add the _EL1s into the defines for ID_AA64MMFR0_EL1 to follow the convention. No functional changes. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20220905225425.1871461-5-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-07iommu/dma: Move public interfaces to linux/iommu.hRobin Murphy
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with only a couple more public interfaces left pertaining to MSI integration. Since these depend on the main IOMMU API header anyway, move their declarations there, taking the opportunity to update the half-baked comments to proper kerneldoc along the way. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-06arm64: compat: Implement misalignment fixups for multiword loadsArd Biesheuvel
The 32-bit ARM kernel implements fixups on behalf of user space when using LDM/STM or LDRD/STRD instructions on addresses that are not 32-bit aligned. This is not something that is supported by the architecture, but was done anyway to increase compatibility with user space software, which mostly targeted x86 at the time and did not care about aligned accesses. This feature is one of the remaining impediments to being able to switch to 64-bit kernels on 64-bit capable hardware running 32-bit user space, so let's implement it for the arm64 compat layer as well. Note that the intent is to implement the exact same handling of misaligned multi-word loads and stores as the 32-bit kernel does, including what appears to be missing support for user space programs that rely on SETEND to switch to a different byte order and back. Also, like the 32-bit ARM version, we rely on the faulting address reported by the CPU to infer the memory address, instead of decoding the instruction fully to obtain this information. This implementation is taken from the 32-bit ARM tree, with all pieces removed that deal with instructions other than LDRD/STRD and LDM/STM, or that deal with alignment exceptions taken in kernel mode. Cc: debian-arm@lists.debian.org Cc: Vagrant Cascadian <vagrant@debian.org> Cc: Riku Voipio <riku.voipio@iki.fi> Cc: Steve McIntyre <steve@einval.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20220701135322.3025321-1-ardb@kernel.org [catalin.marinas@arm.com: change the option to 'default n'] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-08-23arm64: fix rodata=fullMark Rutland
On arm64, "rodata=full" has been suppored (but not documented) since commit: c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well") As it's necessary to determine the rodata configuration early during boot, arm64 has an early_param() handler for this, whereas init/main.c has a __setup() handler which is run later. Unfortunately, this split meant that since commit: f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions") ... passing "rodata=full" would result in a spurious warning from the __setup() handler (though RO permissions would be configured appropriately). Further, "rodata=full" has been broken since commit: 0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") ... which caused strtobool() to parse "full" as false (in addition to many other values not documented for the "rodata=" kernel parameter. This patch fixes this breakage by: * Moving the core parameter parser to an __early_param(), such that it is available early. * Adding an (optional) arch hook which arm64 can use to parse "full". * Updating the documentation to mention that "full" is valid for arm64. * Having the core parameter parser handle "on" and "off" explicitly, such that any undocumented values (e.g. typos such as "ful") are reported as errors rather than being silently accepted. Note that __setup() and early_param() have opposite conventions for their return values, where __setup() uses 1 to indicate a parameter was handled and early_param() uses 0 to indicate a parameter was handled. Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions") Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jagdish Gediya <jvgediya@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-08-08mm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled()Muchun Song
Patch series "Simplify hugetlb vmemmap and improve its readability", v2. This series aims to simplify hugetlb vmemmap and improve its readability. This patch (of 8): The name hugetlb_optimize_vmemmap_enabled() a bit confusing as it tests two conditions (enabled and pages in use). Instead of coming up to an appropriate name, we could just delete it. There is already a discussion about deleting it in thread [1]. There is only one user of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c. However, it does not need to call hugetlb_optimize_vmemmap_enabled() in flush_dcache_page() since HugeTLB pages are always fully mapped and only head page will be set PG_dcache_clean meaning only head page's flag may need to be cleared (see commit cf5a501d985b). So it is easy to remove hugetlb_optimize_vmemmap_enabled(). Link: https://lore.kernel.org/all/c77c61c8-8a5a-87e8-db89-d04d8aaab4cc@oracle.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-05Merge tag 'mm-stable-2022-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
2022-08-03Merge tag 'efi-next-for-v5.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Enable mirrored memory for arm64 - Fix up several abuses of the efivar API - Refactor the efivar API in preparation for moving the 'business logic' part of it into efivarfs - Enable ACPI PRM on arm64 * tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits) ACPI: Move PRM config option under the main ACPI config ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64 ACPI: PRM: Change handler_addr type to void pointer efi: Simplify arch_efi_call_virt() macro drivers: fix typo in firmware/efi/memmap.c efi: vars: Drop __efivar_entry_iter() helper which is no longer used efi: vars: Use locking version to iterate over efivars linked lists efi: pstore: Omit efivars caching EFI varstore access layer efi: vars: Add thin wrapper around EFI get/set variable interface efi: vars: Don't drop lock in the middle of efivar_init() pstore: Add priv field to pstore_record for backend specific use Input: applespi - avoid efivars API and invoke EFI services directly selftests/kexec: remove broken EFI_VARS secure boot fallback check brcmfmac: Switch to appropriate helper to load EFI variable contents iwlwifi: Switch to proper EFI variable store interface media: atomisp_gmin_platform: stop abusing efivar API efi: efibc: avoid efivar API for setting variables efi: avoid efivars layer when loading SSDTs from variables efi: Correct comment on efi_memmap_alloc memblock: Disable mirror feature if kernelcore is not specified ...
2022-08-01Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Highlights include a major rework of our kPTI page-table rewriting code (which makes it both more maintainable and considerably faster in the cases where it is required) as well as significant changes to our early boot code to reduce the need for data cache maintenance and greatly simplify the KASLR relocation dance. Summary: - Remove unused generic cpuidle support (replaced by PSCI version) - Fix documentation describing the kernel virtual address space - Handling of some new CPU errata in Arm implementations - Rework of our exception table code in preparation for handling machine checks (i.e. RAS errors) more gracefully - Switch over to the generic implementation of ioremap() - Fix lockdep tracking in NMI context - Instrument our memory barrier macros for KCSAN - Rework of the kPTI G->nG page-table repainting so that the MMU remains enabled and the boot time is no longer slowed to a crawl for systems which require the late remapping - Enable support for direct swapping of 2MiB transparent huge-pages on systems without MTE - Fix handling of MTE tags with allocating new pages with HW KASAN - Expose the SMIDR register to userspace via sysfs - Continued rework of the stack unwinder, particularly improving the behaviour under KASAN - More repainting of our system register definitions to match the architectural terminology - Improvements to the layout of the vDSO objects - Support for allocating additional bits of HWCAP2 and exposing FEAT_EBF16 to userspace on CPUs that support it - Considerable rework and optimisation of our early boot code to reduce the need for cache maintenance and avoid jumping in and out of the kernel when handling relocation under KASLR - Support for disabling SVE and SME support on the kernel command-line - Support for the Hisilicon HNS3 PMU - Miscellanous cleanups, trivial updates and minor fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits) arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr} arm64: fix KASAN_INLINE arm64/hwcap: Support FEAT_EBF16 arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long arm64/hwcap: Document allocation of upper bits of AT_HWCAP arm64: enable THP_SWAP for arm64 arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52 arm64: errata: Remove AES hwcap for COMPAT tasks arm64: numa: Don't check node against MAX_NUMNODES drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node() docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON mm: kasan: Skip unpoisoning of user pages mm: kasan: Ensure the tags are visible before the tag in page->flags drivers/perf: hisi: add driver for HNS3 PMU drivers/perf: hisi: Add description for HNS3 PMU driver drivers/perf: riscv_pmu_sbi: perf format perf/arm-cci: Use the bitmap API to allocate bitmaps ...
2022-07-25Merge branch 'for-next/boot' into for-next/coreWill Deacon
* for-next/boot: (34 commits) arm64: fix KASAN_INLINE arm64: Add an override for ID_AA64SMFR0_EL1.FA64 arm64: Add the arm64.nosve command line option arm64: Add the arm64.nosme command line option arm64: Expose a __check_override primitive for oddball features arm64: Allow the idreg override to deal with variable field width arm64: Factor out checking of a feature against the override into a macro arm64: Allow sticky E2H when entering EL1 arm64: Save state of HCR_EL2.E2H before switch to EL1 arm64: Rename the VHE switch to "finalise_el2" arm64: mm: fix booting with 52-bit address space arm64: head: remove __PHYS_OFFSET arm64: lds: use PROVIDE instead of conditional definitions arm64: setup: drop early FDT pointer helpers arm64: head: avoid relocating the kernel twice for KASLR arm64: kaslr: defer initialization to initcall where permitted arm64: head: record CPU boot mode after enabling the MMU arm64: head: populate kernel page tables with MMU and caches on arm64: head: factor out TTBR1 assignment into a macro arm64: idreg-override: use early FDT mapping in ID map ...
2022-07-25Merge branch 'for-next/mte' into for-next/coreWill Deacon
* for-next/mte: arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON mm: kasan: Skip unpoisoning of user pages mm: kasan: Ensure the tags are visible before the tag in page->flags
2022-07-25Merge branch 'for-next/misc' into for-next/coreWill Deacon
* for-next/misc: arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52 arm64: numa: Don't check node against MAX_NUMNODES arm64: mm: Remove assembly DMA cache maintenance wrappers arm64/mm: Define defer_reserve_crashkernel() arm64: fix oops in concurrently setting insn_emulation sysctls arm64: Do not forget syscall when starting a new thread. arm64: boot: add zstd support
2022-07-25Merge branch 'for-next/kpti' into for-next/coreWill Deacon
* for-next/kpti: arm64: correct the effect of mitigations off on kpti arm64: entry: simplify trampoline data page arm64: mm: install KPTI nG mappings with MMU enabled arm64: kpti-ng: simplify page table traversal logic
2022-07-25Merge branch 'for-next/ioremap' into for-next/coreWill Deacon
* for-next/ioremap: arm64: Add HAVE_IOREMAP_PROT support arm64: mm: Convert to GENERIC_IOREMAP mm: ioremap: Add ioremap/iounmap_allowed() mm: ioremap: Setup phys_addr of struct vm_struct mm: ioremap: Use more sensible name in ioremap_prot() ARM: mm: kill unused runtime hook arch_iounmap()
2022-07-17arm64/mm: move protection_map[] inside the platformAnshuman Khandual
This moves protection_map[] inside the platform and makes it a static. Link: https://lkml.kernel.org/r/20220711070600.2378316-6-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17arm64/hugetlb: implement arm64 specific hugetlb_mask_last_pageBaolin Wang
The HugeTLB address ranges are linearly scanned during fork, unmap and remap operations, and the linear scan can skip to the end of range mapped by the page table page if hitting a non-present entry, which can help to speed linear scanning of the HugeTLB address ranges. So hugetlb_mask_last_page() is introduced to help to update the address in the loop of HugeTLB linear scanning with getting the last huge page mapped by the associated page table page[1], when a non-present entry is encountered. Considering ARM64 specific cont-pte/pmd size HugeTLB, this patch implemented an ARM64 specific hugetlb_mask_last_page() to help this case. [1] https://lore.kernel.org/linux-mm/20220527225849.284839-1-mike.kravetz@oracle.com/ [baolin.wang@linux.alibaba.com: fix build] Link: https://lkml.kernel.org/r/a14e7b39-6a8a-4609-b4a1-84ac574f5c96@linux.alibaba.com Link: https://lkml.kernel.org/r/20220621235620.291305-3-mike.kravetz@oracle.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: kernel test robot <lkp@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-07arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"Catalin Marinas
This reverts commit e5b8d9218951e59df986f627ec93569a0d22149b. Pages mapped in user-space with PROT_MTE have the allocation tags either zeroed or copied/restored to some user values. In order for the kernel to access such pages via page_address(), resetting the tag in page->flags was necessary. This tag resetting was deferred to set_pte_at() -> mte_sync_page_tags() but it can race with another CPU reading the flags (via page_to_virt()): P0 (mte_sync_page_tags): P1 (memcpy from virt_to_page): Rflags!=0xff Wflags=0xff DMB (doesn't help) Wtags=0 Rtags=0 // fault Since now the post_alloc_hook() function resets the page->flags tag when unpoisoning is skipped for user pages (including the __GFP_ZEROTAGS case), revert the arm64 commit calling page_kasan_tag_reset(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Collingbourne <pcc@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20220610152141.2148929-5-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-07mm: kasan: Skip unpoisoning of user pagesCatalin Marinas
Commit c275c5c6d50a ("kasan: disable freed user page poisoning with HW tags") added __GFP_SKIP_KASAN_POISON to GFP_HIGHUSER_MOVABLE. A similar argument can be made about unpoisoning, so also add __GFP_SKIP_KASAN_UNPOISON to user pages. To ensure the user page is still accessible via page_address() without a kasan fault, reset the page->flags tag. With the above changes, there is no need for the arm64 tag_clear_highpage() to reset the page->flags tag. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20220610152141.2148929-3-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-05arm64: mm: Remove assembly DMA cache maintenance wrappersWill Deacon
Remove the __dma_{flush,map,unmap}_area assembly wrappers and call the appropriate cache maintenance functions directly from the DMA mapping callbacks. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220610151228.4562-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-05arm64/mm: Define defer_reserve_crashkernel()Anshuman Khandual
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also impacts overall linear mapping creation including the crash kernel itself. Just encapsulate this deferral check in a new helper for better clarity. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220705062556.1845734-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-03mm: hugetlb: kill set_huge_swap_pte_at()Qi Zheng
Commit e5251fd43007 ("mm/hugetlb: introduce set_huge_swap_pte_at() helper") add set_huge_swap_pte_at() to handle swap entries on architectures that support hugepages consisting of contiguous ptes. And currently the set_huge_swap_pte_at() is only overridden by arm64. set_huge_swap_pte_at() provide a sz parameter to help determine the number of entries to be updated. But in fact, all hugetlb swap entries contain pfn information, so we can find the corresponding folio through the pfn recorded in the swap entry, then the folio_size() is the number of entries that need to be updated. And considering that users will easily cause bugs by ignoring the difference between set_huge_swap_pte_at() and set_huge_pte_at(). Let's handle swap entries in set_huge_pte_at() and remove the set_huge_swap_pte_at(), then we can call set_huge_pte_at() anywhere, which simplifies our coding. Link: https://lkml.kernel.org/r/20220626145717.53572-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-01arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptesWill Deacon
Commit fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()") removed TLB invalidation from get_clear_flush() [now get_clear_contig()] on the basis that the core TLB invalidation code is aware of hugetlb mappings backed by contiguous page-table entries and will cover the correct virtual address range. However, this change also resulted in the TLB invalidation being removed from the "break" step in the break-before-make (BBM) sequence used internally by huge_ptep_set_{access_flags,wrprotect}(), therefore making the BBM sequence unsafe irrespective of later invalidation. Although the architecture is desperately unclear about how exactly contiguous ptes should be updated in a live page-table, restore TLB invalidation to our BBM sequence under the assumption that BBM is the right thing to be doing in the first place. Fixes: fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()") Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220629095349.25748-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-07-01arm64: mm: fix booting with 52-bit address spaceArd Biesheuvel
Joey reports that booting 52-bit VA capable builds on 52-bit VA capable CPUs is broken since commit 0d9b1ffefabe ("arm64: mm: make vabits_actual a build time constant if possible"). This is due to the fact that the primary CPU reads the vabits_actual variable before it has been assigned. The reason for deferring the assignment of vabits_actual was that we try to perform as few stores to memory as we can with the MMU and caches off, due to the cache coherency issues it creates. Since __cpu_setup() [which is where the read of vabits_actual occurs] is also called on the secondary boot path, we cannot just read the CPU ID registers directly, given that the size of the VA space is decided by the capabilities of the primary CPU. So let's read vabits_actual only on the secondary boot path, and read the CPU ID registers directly on the primary boot path, by making it a function parameter of __cpu_setup(). To ensure that all users of vabits_actual (including kasan_early_init()) observe the correct value, move the assignment of vabits_actual back into asm code, but still defer it to after the MMU and caches have been enabled. Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 0d9b1ffefabe ("arm64: mm: make vabits_actual a build time constant if possible") Reported-by: Joey Gouly <joey.gouly@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220701111045.2944309-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-28arm64: extable: cleanup redundant extable type EX_TYPE_FIXUPTong Tiangen
Currently, extable type EX_TYPE_FIXUP is no place to use, We can safely remove it. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220621072638.1273594-7-tongtiangen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-28arm64: extable: add new extable type EX_TYPE_KACCESS_ERR_ZERO supportTong Tiangen
Currently, The extable type EX_TYPE_UACCESS_ERR_ZERO is used by __get/put_kernel_nofault(), but those helpers are not uaccess type, so we add a new extable type EX_TYPE_KACCESS_ERR_ZERO which can be used by __get/put_kernel_no_fault(). This is also to prepare for distinguishing the two types in machine check safe process. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220621072638.1273594-2-tongtiangen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-27Merge branch 'master' into mm-stableakpm
2022-06-27arm64: Add HAVE_IOREMAP_PROT supportKefeng Wang
With ioremap_prot() definition from generic ioremap, also move pte_pgprot() from hugetlbpage.c into pgtable.h, then arm64 could have HAVE_IOREMAP_PROT, which will enable generic_access_phys() code, it is useful for debug, eg, gdb. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220607125027.44946-7-wangkefeng.wang@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-27arm64: mm: Convert to GENERIC_IOREMAPKefeng Wang
Add hook for arm64's special operation when ioremap(), then ioremap_wc/np/cache is converted to use ioremap_prot() from GENERIC_IOREMAP, update the Copyright and kill the unused inclusions. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220607125027.44946-6-wangkefeng.wang@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: record CPU boot mode after enabling the MMUArd Biesheuvel
In order to avoid having to touch memory with the MMU and caches disabled, and therefore having to invalidate it from the caches explicitly, just defer storing the value until after the MMU has been turned on, unless we are giving up with an error. While at it, move the associated variable definitions into C code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-19-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: cover entire kernel image in initial ID mapArd Biesheuvel
As a first step towards avoiding the need to create, tear down and recreate the kernel virtual mapping with MMU and caches disabled, start by expanding the ID map so it covers the page tables as well as all executable code. This will allow us to populate the page tables with the MMU and caches on, and call KASLR init code before setting up the virtual mapping. Since this ID map is only needed at boot, create it as a temporary set of page tables, and populate the permanent ID map after enabling the MMU and caches. While at it, switch to read-only attributes for the where possible, as writable permissions are only needed for the initial kernel page tables. Note that on 4k granule configurations, the permanent ID map will now be reduced to a single page rather than a 2M block mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-13-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: mm: provide idmap pointer to cpu_replace_ttbr1()Ard Biesheuvel
In preparation for changing the way we initialize the permanent ID map, update cpu_replace_ttbr1() so we can use it with the initial ID map as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-11-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: drop idmap_ptrs_per_pgdArd Biesheuvel
The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even though it is updated with the MMU and caches disabled. However, we never bother to read the value again except in the very next instruction, and so we can just drop the variable entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-5-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: move assignment of idmap_t0sz to C codeArd Biesheuvel
Setting idmap_t0sz involves fiddling with the caches if done with the MMU off. Since we will be creating an initial ID map with the MMU and caches off, and the permanent ID map with the MMU and caches on, let's move this assignment of idmap_t0sz out of the startup code, and replace it with a macro that simply issues the three instructions needed to calculate the value wherever it is needed before the MMU is turned on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-4-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: mm: make vabits_actual a build time constant if possibleArd Biesheuvel
Currently, we only support 52-bit virtual addressing on 64k pages configurations, and in all other cases, vabits_actual is guaranteed to equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in that case. While at it, move the assignment out of the asm entry code - it has no need to be there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: move kimage_vaddr variable into C fileArd Biesheuvel
This variable definition does not need to be in head.S so move it out. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: entry: simplify trampoline data pageArd Biesheuvel
Get rid of some clunky open coded arithmetic on section addresses, by emitting the trampoline data variables into a separate, dedicated r/o data section, and putting it at the next page boundary. This way, we can access the literals via single LDR instruction. While at it, get rid of other, implicit literals, and use ADRP/ADD or MOVZ/MOVK sequences, as appropriate. Note that the latter are only supported for CONFIG_RELOCATABLE=n (which is usually the case if CONFIG_RANDOMIZE_BASE=n), so update the CPP conditionals to reflect this. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220622161010.3845775-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23arm64: mm: install KPTI nG mappings with MMU enabledArd Biesheuvel
In cases where we unmap the kernel while running in user space, we rely on ASIDs to distinguish the minimal trampoline from the full kernel mapping, and this means we must use non-global attributes for those mappings, to ensure they are scoped by ASID and will not hit in the TLB inadvertently. We only do this when needed, as this is generally more costly in terms of TLB pressure, and so we boot without these non-global attributes, and apply them to all existing kernel mappings once all CPUs are up and we know whether or not the non-global attributes are needed. At this point, we cannot simply unmap and remap the entire address space, so we have to update all existing block and page descriptors in place. Currently, we go through a lot of trouble to perform these updates with the MMU and caches off, to avoid violating break before make (BBM) rules imposed by the architecture. Since we make changes to page tables that are not covered by the ID map, we gain access to those descriptors by disabling translations altogether. This means that the stores to memory are issued with device attributes, and require extra care in terms of coherency, which is costly. We also rely on the ID map to access a shared flag, which requires the ID map to be executable and writable at the same time, which is another thing we'd prefer to avoid. So let's switch to an approach where we replace the kernel mapping with a minimal mapping of a few pages that can be used for a minimal, ad-hoc fixmap that we can use to map each page table in turn as we traverse the hierarchy. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220609174320.4035379-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23arm64: kpti-ng: simplify page table traversal logicArd Biesheuvel
Simplify the KPTI G-to-nG asm helper code by: - pulling the 'table bit' test into the get/put macros so we can combine them and incorporate the entire loop; - moving the 'table bit' test after the update of bit #11 so we no longer need separate next_xxx and skip_xxx labels; - redefining the pmd/pud register aliases and the next_pmd/next_pud labels instead of branching to them if the number of configured page table levels is less than 3 or 4, respectively. No functional change intended, except for the fact that we now descend into a next level table after setting bit #11 on its descriptor but this should make no difference in practice. While at it, switch to .L prefixed local labels so they don't clutter up the symbol tables, kallsyms, etc, and clean up the indentation for legibility. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220609174320.4035379-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-17arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transferWill Deacon
Invalidating the buffer memory in arch_sync_dma_for_device() for FROM_DEVICE transfers When using the streaming DMA API to map a buffer prior to inbound non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU cachelines so that they will not be written back during the transfer and corrupt the buffer contents written by the DMA. This, however, poses two potential problems: (1) If the DMA transfer does not write to every byte in the buffer, then the unwritten bytes will contain stale data once the transfer has completed. (2) If the buffer has a virtual alias in userspace, then stale data may be visible via this alias during the period between performing the cache invalidation and the DMA writes landing in memory. Address both of these issues by cleaning (aka writing-back) the dirty lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding them using invalidation. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-16mm: avoid unnecessary page fault retires on shared memory typesPeter Xu
I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Brian Cain <bcain@quicinc.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Weinberger <richard@nod.at> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Will Deacon <will@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Simek <monstr@monstr.eu> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-15arm64: mm: Only remove nomap flag for initrdMa Wupeng
Commit 177e15f0c144 ("arm64: add the initrd region to the linear mapping explicitly") remove all the flags of the memory used by initrd. This is fine since MEMBLOCK_MIRROR is not used in arm64. However with mirrored feature introduced to arm64, this will clear the mirrored flag used by initrd, which will lead to error log printed by find_zone_movable_pfns_for_nodes() if the lower 4G range has some non-mirrored memory. To solve this problem, only MEMBLOCK_NOMAP flag will be removed via memblock_clear_nomap(). Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220614092156.1972846-5-mawupeng1@huawei.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>