summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-14arm64: kprobe: add checks for ARMv8.3-PAuth combined instructionsAmit Daniel Kachhap
Currently the ARMv8.3-PAuth combined branch instructions (braa, retaa etc.) are not simulated for out-of-line execution with a handler. Hence the uprobe of such instructions leads to kernel warnings in a loop as they are not explicitly checked and fall into INSN_GOOD categories. Other combined instructions like LDRAA and LDRBB can be probed. The issue of the combined branch instructions is fixed by adding group definitions of all such instructions and rejecting their probes. The instruction groups added are br_auth(braa, brab, braaz and brabz), blr_auth(blraa, blrab, blraaz and blrabz), ret_auth(retaa and retab) and eret_auth(eretaa and eretab). Warning log: WARNING: CPU: 0 PID: 156 at arch/arm64/kernel/probes/uprobes.c:182 uprobe_single_step_handler+0x34/0x50 Modules linked in: CPU: 0 PID: 156 Comm: func Not tainted 5.9.0-rc3 #188 Hardware name: Foundation-v8A (DT) pstate: 804003c9 (Nzcv DAIF +PAN -UAO BTYPE=--) pc : uprobe_single_step_handler+0x34/0x50 lr : single_step_handler+0x70/0xf8 sp : ffff800012af3e30 x29: ffff800012af3e30 x28: ffff000878723b00 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 x23: 0000000060001000 x22: 00000000cb000022 x21: ffff800012065ce8 x20: ffff800012af3ec0 x19: ffff800012068d50 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800010085c90 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff80001205a9c8 x5 : ffff80001205a000 x4 : ffff80001233db80 x3 : ffff8000100a7a60 x2 : 0020000000000003 x1 : 0000fffffffff008 x0 : ffff800012af3ec0 Call trace: uprobe_single_step_handler+0x34/0x50 single_step_handler+0x70/0xf8 do_debug_exception+0xb8/0x130 el0_sync_handler+0x138/0x1b8 el0_sync+0x158/0x180 Fixes: 74afda4016a7 ("arm64: compile the kernel with ptrauth return address signing") Fixes: 04ca3204fa09 ("arm64: enable pointer authentication") Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20200914083656.21428-2-amit.kachhap@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11Documentation/kvm/arm: improve description of HVC_SOFT_RESTARTPingfan Liu
Besides disabling MMU, HVC_SOFT_RESTART also clears I+D bits. These behaviors are what kexec-reboot expects, so describe it more precisely. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: linux-doc@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/1598621998-20563-2-git-send-email-kernelfans@gmail.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/relocate_kernel: remove redundant codePingfan Liu
Kernel startup entry point requires disabling MMU and D-cache. As for kexec-reboot, taking a close look at "msr sctlr_el1, x12" in __cpu_soft_restart as the following: -1. booted at EL1 The instruction is enough to disable MMU and I/D cache for EL1 regime. -2. booted at EL2, using VHE Access to SCTLR_EL1 is redirected to SCTLR_EL2 in EL2. So the instruction is enough to disable MMU and clear I+C bits for EL2 regime. -3. booted at EL2, not using VHE The instruction itself can not affect EL2 regime. But The hyp-stub doesn't enable the MMU and I/D cache for EL2 regime. And KVM also disable them for EL2 regime when its unloaded, or execute a HVC_SOFT_RESTART call. So when kexec-reboot, the code in KVM has prepare the requirement. As a conclusion, disabling MMU and clearing I+C bits in SYM_CODE_START(arm64_relocate_new_kernel) is redundant, and can be removed Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Remi Denis-Courmont <remi.denis.courmont@huawei.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/1598621998-20563-1-git-send-email-kernelfans@gmail.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64: Remove the unused include statementsTian Tao
linux/arm-smccc.h is included more than once, Remove the one that isn't necessary. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/1599643682-10404-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Unify CONT_PMD_SHIFTGavin Shan
Similar to how CONT_PTE_SHIFT is determined, this introduces a new kernel option (CONFIG_CONT_PMD_SHIFT) to determine CONT_PMD_SHIFT. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200910095936.20307-3-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Unify CONT_PTE_SHIFTGavin Shan
CONT_PTE_SHIFT actually depends on CONFIG_ARM64_CONT_SHIFT. It's reasonable to reflect the dependency: * This renames CONFIG_ARM64_CONT_SHIFT to CONFIG_ARM64_CONT_PTE_SHIFT, so that we can introduce CONFIG_ARM64_CONT_PMD_SHIFT later. * CONT_{SHIFT, SIZE, MASK}, defined in page-def.h are removed as they are not used by anyone. * CONT_PTE_SHIFT is determined by CONFIG_ARM64_CONT_PTE_SHIFT. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200910095936.20307-2-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Remove CONT_RANGE_OFFSETGavin Shan
The macro was introduced by commit <ecf35a237a85> ("arm64: PTE/PMD contiguous bit definition") at the beginning. It's only used by commit <348a65cdcbbf> ("arm64: Mark kernel page ranges contiguous"), which was reverted later by commit <667c27597ca8>. This makes the macro unused. This removes the unused macro (CONT_RANGE_OFFSET). Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200910095936.20307-1-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitionsAnshuman Khandual
HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are scanned for /proc/cpuinfo are detached from their bit definitions making it vulnerable and difficult to correlate. It is also bit problematic because during /proc/cpuinfo dump these arrays get traversed sequentially assuming they reflect and match actual HWCAP bit sequence, to test various features for a given CPU. This redefines name arrays per their HWCAP bit definitions . It also warns after detecting any feature which is not expected on arm64. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599630535-29337-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Enable THP migrationAnshuman Khandual
In certain page migration situations, a THP page can be migrated without being split into it's constituent subpages. This saves time required to split a THP and put it back together when required. But it also saves an wider address range translation covered by a single TLB entry, reducing future page fault costs. A previous patch changed platform THP helpers per generic memory semantics, clearing the path for THP migration support. This adds two more THP helpers required to create PMD migration swap entries. Now enable THP migration via ARCH_ENABLE_THP_MIGRATION. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599627183-14453-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Change THP helpers to comply with generic MM semanticsAnshuman Khandual
pmd_present() and pmd_trans_huge() are expected to behave in the following manner during various phases of a given PMD. It is derived from a previous detailed discussion on this topic [1] and present THP documentation [2]. pmd_present(pmd): - Returns true if pmd refers to system RAM with a valid pmd_page(pmd) - Returns false if pmd refers to a migration or swap entry pmd_trans_huge(pmd): - Returns true if pmd refers to system RAM and is a trans huge mapping ------------------------------------------------------------------------- | PMD states | pmd_present | pmd_trans_huge | ------------------------------------------------------------------------- | Mapped | Yes | Yes | ------------------------------------------------------------------------- | Splitting | Yes | Yes | ------------------------------------------------------------------------- | Migration/Swap | No | No | ------------------------------------------------------------------------- The problem: PMD is first invalidated with pmdp_invalidate() before it's splitting. This invalidation clears PMD_SECT_VALID as below. PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re- affirm pmd_present() as true during the THP split process. To comply with above mentioned semantics, pmd_trans_huge() should also check pmd_present() first before testing presence of an actual transparent huge mapping. The solution: Ideally PMD_TYPE_SECT should have been used here instead. But it shares the bit position with PMD_SECT_VALID which is used for THP invalidation. Hence it will not be there for pmd_present() check after pmdp_invalidate(). A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD entry during invalidation which can help pmd_present() return true and in recognizing the fact that it still points to memory. This bit is transient. During the split process it will be overridden by a page table page representing normal pages in place of erstwhile huge page. Other pmdp_invalidate() callers always write a fresh PMD value on the entry overriding this transient PMD_PRESENT_INVALID bit, which makes it safe. [1]: https://lkml.org/lkml/2018/10/17/231 [2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599627183-14453-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64: topology: Stop using MPIDR for topology informationValentin Schneider
In the absence of ACPI or DT topology data, we fallback to haphazardly decoding *something* out of MPIDR. Sadly, the contents of that register are mostly unusable due to the implementation leniancy and things like Aff0 having to be capped to 15 (despite being encoded on 8 bits). Consider a simple system with a single package of 32 cores, all under the same LLC. We ought to be shoving them in the same core_sibling mask, but MPIDR is going to look like: | CPU | 0 | ... | 15 | 16 | ... | 31 | |------+---+-----+----+----+-----+----+ | Aff0 | 0 | ... | 15 | 0 | ... | 15 | | Aff1 | 0 | ... | 0 | 1 | ... | 1 | | Aff2 | 0 | ... | 0 | 0 | ... | 0 | Which will eventually yield core_sibling(0-15) == 0-15 core_sibling(16-31) == 16-31 NUMA woes ========= If we try to play games with this and set up NUMA boundaries within those groups of 16 cores via e.g. QEMU: # Node0: 0-9; Node1: 10-19 $ qemu-system-aarch64 <blah> \ -smp 20 -numa node,cpus=0-9,nodeid=0 -numa node,cpus=10-19,nodeid=1 The scheduler's MC domain (all CPUs with same LLC) is going to be built via arch_topology.c::cpu_coregroup_mask() In there we try to figure out a sensible mask out of the topology information we have. In short, here we'll pick the smallest of NUMA or core sibling mask. node_mask(CPU9) == 0-9 core_sibling(CPU9) == 0-15 MC mask for CPU9 will thus be 0-9, not a problem. node_mask(CPU10) == 10-19 core_sibling(CPU10) == 0-15 MC mask for CPU10 will thus be 10-19, not a problem. node_mask(CPU16) == 10-19 core_sibling(CPU16) == 16-19 MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two different unique MC spans, and the scheduler has no idea what to make of that. That triggers the WARN_ON() added by commit ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap") Fixing MPIDR-derived topology ============================= We could try to come up with some cleverer scheme to figure out which of the available masks to pick, but really if one of those masks resulted from MPIDR then it should be discarded because it's bound to be bogus. I was hoping to give MPIDR a chance for SMT, to figure out which threads are in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed out to me that there are systems out there where *all* cores have non-zero values in their higher affinity fields (e.g. RK3288 has "5" in all of its cores' MPIDR.Aff1), which would expose a bogus core ID to userspace. Stop using MPIDR for topology information. When no other source of topology information is available, mark each CPU as its own core and its NUMA node as its LLC domain. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20200829130016.26106-1-valentin.schneider@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64/mm/ptdump: Add address markers for BPF regionsAnshuman Khandual
Kernel virtual region [BPF_JIT_REGION_START..BPF_JIT_REGION_END] is missing from address_markers[], hence relevant page table entries are not displayed with /sys/kernel/debug/kernel_page_tables. This adds those missing markers. While here, also rename arch/arm64/mm/dump.c which sounds bit ambiguous, as arch/arm64/mm/ptdump.c instead. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599208259-11191-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64: perf: Remove unnecessary event_idx checkQi Liu
event_idx is obtained from armv8pmu_get_event_idx(), and this idx must be between ARMV8_IDX_CYCLE_COUNTER and cpu_pmu->num_events. So it's unnecessary to do this check. Let's remove it. Signed-off-by: Qi Liu <liuqi115@huawei.com> Link: https://lore.kernel.org/r/1599213458-28394-1-git-send-email-liuqi115@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64: get rid of TEXT_OFFSETArd Biesheuvel
TEXT_OFFSET serves no purpose, and for this reason, it was redefined as 0x0 in the v5.8 timeframe. Since this does not appear to have caused any issues that require us to revisit that decision, let's get rid of the macro entirely, along with any references to it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20200825135440.11288-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07ACPI/IORT: Remove the unused inline functionsZenghui Yu
Since commit 8212688600ed ("ACPI/IORT: Fix build error when IOMMU_SUPPORT is disabled"), iort_fwspec_iommu_ops() and iort_add_device_replay() are not needed anymore when CONFIG_IOMMU_API is not selected. Let's remove them. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/20200818063625.980-3-yuzenghui@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07ACPI/IORT: Drop the unused @ops of iort_add_device_replay()Zenghui Yu
Since commit d2e1a003af56 ("ACPI/IORT: Don't call iommu_ops->add_device directly"), we use the IOMMU core API to replace a direct invoke of the specified callback. The parameter @ops has therefore became unused. Let's drop it. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/20200818063625.980-2-yuzenghui@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64/numa: Fix a typo in comment of arm64_numa_initYanfei Xu
Fix a typo in comment of arm64_numa_init. 'encomapssing' should be 'encompassing'. Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20200901091154.10112-1-yanfei.xu@windriver.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64: fix some spelling mistakes in the comments by codespellXiaoming Ni
arch/arm64/include/asm/cpu_ops.h:24: necesary ==> necessary arch/arm64/include/asm/kvm_arm.h:69: maintainance ==> maintenance arch/arm64/include/asm/cpufeature.h:361: capabilties ==> capabilities arch/arm64/kernel/perf_regs.c:19: compatability ==> compatibility arch/arm64/kernel/smp_spin_table.c:86: endianess ==> endianness arch/arm64/kernel/smp_spin_table.c:88: endianess ==> endianness arch/arm64/kvm/vgic/vgic-mmio-v3.c:1004: targetting ==> targeting arch/arm64/kvm/vgic/vgic-mmio-v3.c:1005: targetting ==> targeting Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Link: https://lore.kernel.org/r/20200828031822.35928-1-nixiaoming@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07drivers/perf: hisi: Add missing include of linux/module.hShaokun Zhang
MODULE_*** is used in HiSilicon uncore PMU drivers and is provided by linux/module.h, but the header file is not directly included. Add the missing include. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/1599186097-18599-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64: perf: Add general hardware LLC events for PMUv3Leo Yan
This patch is to add the general hardware last level cache (LLC) events for PMUv3: one event is for LLC access and another is for LLC miss. With this change, perf tool can support last level cache profiling, below is an example to demonstrate the usage on Arm64: $ perf stat -e LLC-load-misses -e LLC-loads -- \ perf bench mem memcpy -s 1024MB -l 100 -f default [...] Performance counter stats for 'perf bench mem memcpy -s 1024MB -l 100 -f default': 35,534,262 LLC-load-misses # 2.16% of all LL-cache hits 1,643,946,443 LLC-loads [...] Signed-off-by: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20200811053505.21223-1-leo.yan@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-07arm64: traps: Add str of description to panic() in die()Yue Hu
Currently, there are different description strings in die() such as die("Oops",,), die("Oops - BUG",,). And panic() called by die() will always show "Fatal exception" or "Fatal exception in interrupt". Note that panic() will run any panic handler via panic_notifier_list. And the string above will be formatted and placed in static buf[] which will be passed to handler. So panic handler can not distinguish which Oops it is from the buf if we want to do some things like reserve the string in memory or panic statistics. It's not benefit to debug. We need to add more codes to troubleshoot. Let's utilize existing resource to make debug much simpler. Signed-off-by: Yue Hu <huyue2@yulong.com> Link: https://lore.kernel.org/r/20200804085347.10720-1-zbestahu@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Add Memory Tagging Extension documentationVincenzo Frascino
Memory Tagging Extension (part of the ARMv8.5 Extensions) provides a mechanism to detect the sources of memory related errors which may be vulnerable to exploitation, including bounds violations, use-after-free, use-after-return, use-out-of-scope and use before initialization errors. Add Memory Tagging Extension documentation for the arm64 linux kernel support. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Kconfig entryVincenzo Frascino
Add Memory Tagging Extension support to the arm64 kbuild. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Save tags when hibernatingSteven Price
When hibernating the contents of all pages in the system are written to disk, however the MTE tags are not visible to the generic hibernation code. So just before the hibernation image is created copy the tags out of the physical tag storage into standard memory so they will be included in the hibernation image. After hibernation apply the tags back into the physical tag storage. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Enable swap of tagged pagesSteven Price
When swapping pages out to disk it is necessary to save any tags that have been set, and restore when swapping back in. Make use of the new page flag (PG_ARCH_2, locally named PG_mte_tagged) to identify pages with tags. When swapping out these pages the tags are stored in memory and later restored when the pages are brought back in. Because shmem can swap pages back in without restoring the userspace PTE it is also necessary to add a hook for shmem. Signed-off-by: Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: move function prototypes to mte.h] [catalin.marinas@arm.com: drop '_tags' from arch_swap_restore_tags()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will@kernel.org>
2020-09-04mm: Add arch hooks for saving/restoring tagsSteven Price
Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to every physical page, when swapping pages out to disk it is necessary to save these tags, and later restore them when reading the pages back. Add some hooks along with dummy implementations to enable the arch code to handle this. Three new hooks are added to the swap code: * arch_prepare_to_swap() and * arch_swap_invalidate_page() / arch_swap_invalidate_area(). One new hook is added to shmem: * arch_swap_restore() Signed-off-by: Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: add unlock_page() on the error path] [catalin.marinas@arm.com: dropped the _tags suffix] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2020-09-04fs: Handle intra-page faults in copy_mount_options()Catalin Marinas
The copy_mount_options() function takes a user pointer argument but no size and it tries to read up to a PAGE_SIZE. However, copy_from_user() is not guaranteed to return all the accessible bytes if, for example, the access crosses a page boundary and gets a fault on the second page. To work around this, the current copy_mount_options() implementation performs two copy_from_user() passes, first to the end of the current page and the second to what's left in the subsequent page. On arm64 with MTE enabled, access to a user page may trigger a fault after part of the buffer in a page has been copied (when the user pointer tag, bits 56-59, no longer matches the allocation tag stored in memory). Allow copy_mount_options() to handle such intra-page faults by resorting to byte at a time copy in case of copy_from_user() failure. Note that copy_from_user() handles the zeroing of the kernel buffer in case of error. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2020-09-04arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regsetCatalin Marinas
This regset allows read/write access to a ptraced process prctl(PR_SET_TAGGED_ADDR_CTRL) setting. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alan Hayward <Alan.Hayward@arm.com> Cc: Luis Machado <luis.machado@linaro.org> Cc: Omair Javaid <omair.javaid@linaro.org>
2020-09-04arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS supportCatalin Marinas
Add support for bulk setting/getting of the MTE tags in a tracee's address space at 'addr' in the ptrace() syscall prototype. 'data' points to a struct iovec in the tracer's address space with iov_base representing the address of a tracer's buffer of length iov_len. The tags to be copied to/from the tracer's buffer are stored as one tag per byte. On successfully copying at least one tag, ptrace() returns 0 and updates the tracer's iov_len with the number of tags copied. In case of error, either -EIO or -EFAULT is returned, trying to follow the ptrace() man page. Note that the tag copying functions are not performance critical, therefore they lack optimisations found in typical memory copy routines. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alan Hayward <Alan.Hayward@arm.com> Cc: Luis Machado <luis.machado@linaro.org> Cc: Omair Javaid <omair.javaid@linaro.org>
2020-09-04arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasksCatalin Marinas
In preparation for ptrace() access to the prctl() value, allow calling these functions on non-current tasks. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Restore the GCR_EL1 register after a suspendCatalin Marinas
The CPU resume/suspend routines only take care of the common system registers. Restore GCR_EL1 in addition via the __cpu_suspend_exit() function. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2020-09-04arm64: mte: Allow user control of the generated random tags via prctl()Catalin Marinas
The IRG, ADDG and SUBG instructions insert a random tag in the resulting address. Certain tags can be excluded via the GCR_EL1.Exclude bitmap when, for example, the user wants a certain colour for freed buffers. Since the GCR_EL1 register is not accessible at EL0, extend the prctl(PR_SET_TAGGED_ADDR_CTRL) interface to include a 16-bit field in the first argument for controlling which tags can be generated by the above instruction (an include rather than exclude mask). Note that by default all non-zero tags are excluded. This setting is per-thread. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Allow user control of the tag check mode via prctl()Catalin Marinas
By default, even if PROT_MTE is set on a memory range, there is no tag check fault reporting (SIGSEGV). Introduce a set of option to the exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag check fault mode: PR_MTE_TCF_NONE - no reporting (default) PR_MTE_TCF_SYNC - synchronous tag check fault reporting PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting These options translate into the corresponding SCTLR_EL1.TCF0 bitfield, context-switched by the kernel. Note that the kernel accesses to the user address space (e.g. read() system call) are not checked if the user thread tag checking mode is PR_MTE_TCF_NONE or PR_MTE_TCF_ASYNC. If the tag checking mode is PR_MTE_TCF_SYNC, the kernel makes a best effort to check its user address accesses, however it cannot always guarantee it. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04mm: Allow arm64 mmap(PROT_MTE) on RAM-based filesCatalin Marinas
Since arm64 memory (allocation) tags can only be stored in RAM, mapping files with PROT_MTE is not allowed by default. RAM-based files like those in a tmpfs mount or memfd_create() can support memory tagging, so update the vm_flags accordingly in shmem_mmap(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2020-09-04arm64: mte: Validate the PROT_MTE request via arch_validate_flags()Catalin Marinas
Make use of the newly introduced arch_validate_flags() hook to sanity-check the PROT_MTE request passed to mmap() and mprotect(). If the mapping does not support MTE, these syscalls will return -EINVAL. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04mm: Introduce arch_validate_flags()Catalin Marinas
Similarly to arch_validate_prot() called from do_mprotect_pkey(), an architecture may need to sanity-check the new vm_flags. Define a dummy function always returning true. In addition to do_mprotect_pkey(), also invoke it from mmap_region() prior to updating vma->vm_page_prot to allow the architecture code to veto potentially inconsistent vm_flags. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2020-09-04arm64: mte: Add PROT_MTE support to mmap() and mprotect()Catalin Marinas
To enable tagging on a memory range, the user must explicitly opt in via a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new memory type in the AttrIndx field of a pte, simplify the or'ing of these bits over the protection_map[] attributes by making MT_NORMAL index 0. There are two conditions for arch_vm_get_page_prot() to return the MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE, registered as VM_MTE in the vm_flags, and (2) the vma supports MTE, decided during the mmap() call (only) and registered as VM_MTE_ALLOWED. arch_calc_vm_prot_bits() is responsible for registering the user request as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its mmap() file ops call. In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on stack or brk area. The Linux mmap() syscall currently ignores unknown PROT_* flags. In the presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE will not report an error and the memory will not be mapped as Normal Tagged. For consistency, mprotect(PROT_MTE) will not report an error either if the memory range does not support MTE. Two subsequent patches in the series will propose tightening of this behaviour. Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04mm: Introduce arch_calc_vm_flag_bits()Kevin Brodsky
Similarly to arch_calc_vm_prot_bits(), introduce a dummy arch_calc_vm_flag_bits() invoked from calc_vm_flag_bits(). This macro can be overridden by architectures to insert specific VM_* flags derived from the mmap() MAP_* flags. Signed-off-by: Kevin Brodsky <Kevin.Brodsky@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org>
2020-09-04arm64: mte: Tags-aware aware memcmp_pages() implementationCatalin Marinas
When the Memory Tagging Extension is enabled, two pages are identical only if both their data and tags are identical. Make the generic memcmp_pages() a __weak function and add an arm64-specific implementation which returns non-zero if any of the two pages contain valid MTE tags (PG_mte_tagged set). There isn't much benefit in comparing the tags of two pages since these are normally used for heap allocations and likely to differ anyway. Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: Avoid unnecessary clear_user_page() indirectionCatalin Marinas
Since clear_user_page() calls clear_page() directly, avoid the unnecessary indirection. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Tags-aware copy_{user_,}highpage() implementationsVincenzo Frascino
When the Memory Tagging Extension is enabled, the tags need to be preserved across page copy (e.g. for copy-on-write, page migration). Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to the destination if the source page has the PG_mte_tagged flag set. copy_user_page() does not need to handle tag copying since, with this patch, it is only called by the DAX code where there is no source page structure (and no source tags). Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTECatalin Marinas
Pages allocated by the kernel are not guaranteed to have the tags zeroed, especially as the kernel does not (yet) use MTE itself. To ensure the user can still access such pages when mapped into its address space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged (PG_arch_2) - is used to track pages with valid allocation tags. Since the zero page is mapped as pte_special(), it won't be covered by the above set_pte_at() mechanism. Clear its tags during early MTE initialisation. Co-developed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04mm: Preserve the PG_arch_2 flag in __split_huge_page_tail()Catalin Marinas
When a huge page is split into normal pages, part of the head page flags are transferred to the tail pages. However, the PG_arch_* flags are not part of the preserved set. PG_arch_2 is used by the arm64 MTE support to mark pages that have valid tags. The absence of such flag would cause the arm64 set_pte_at() to clear the tags in order to avoid stale tags exposed to user or the swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge page splitting leads to tag corruption in the tail pages. Preserve the newly added PG_arch_2 flag in __split_huge_page_tail(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org>
2020-09-04mm: Add PG_arch_2 page flagSteven Price
For arm64 MTE support it is necessary to be able to mark pages that contain user space visible tags that will need to be saved/restored e.g. when swapped out. To support this add a new arch specific flag (PG_arch_2). This flag is only available on 64-bit architectures due to the limited number of spare page flags on the 32-bit ones. Signed-off-by: Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: use CONFIG_64BIT for guarding this new flag] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org>
2020-09-04arm64: mte: Handle synchronous and asynchronous tag check faultsVincenzo Frascino
The Memory Tagging Extension has two modes of notifying a tag check fault at EL0, configurable through the SCTLR_EL1.TCF0 field: 1. Synchronous raising of a Data Abort exception with DFSC 17. 2. Asynchronous setting of a cumulative bit in TFSRE0_EL1. Add the exception handler for the synchronous exception and handling of the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in do_notify_resume(). On a tag check failure in user-space, whether synchronous or asynchronous, a SIGSEGV will be raised on the faulting thread. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Add specific SIGSEGV codesVincenzo Frascino
Add MTE-specific SIGSEGV codes to siginfo.h and update the x86 BUILD_BUG_ON(NSIGSEGV != 7) compile check. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> [catalin.marinas@arm.com: renamed precise/imprecise to sync/async] [catalin.marinas@arm.com: dropped #ifdef __aarch64__, renumbered] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: kvm: mte: Hide the MTE CPUID information from the guestsCatalin Marinas
KVM does not support MTE in guests yet, so clear the corresponding field in the ID_AA64PFR1_EL1 register. In addition, inject an undefined exception in the guest if it accesses one of the GCR_EL1, RGSR_EL1, TFSR_EL1 or TFSRE0_EL1 registers. While the emulate_sys_reg() function already injects an undefined exception, this patch prevents the unnecessary printk. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Price <steven.price@arm.com> Acked-by: Marc Zyngier <maz@kernel.org>
2020-09-03arm64: mte: CPU feature detection and initial sysreg configurationVincenzo Frascino
Add the cpufeature and hwcap entries to detect the presence of MTE. Any secondary CPU not supporting the feature, if detected on the boot CPU, will be parked. Add the minimum SCTLR_EL1 and HCR_EL2 bits for enabling MTE. The Normal Tagged memory type is configured in MAIR_EL1 before the MMU is enabled in order to avoid disrupting other CPUs in the CnP domain. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
2020-09-03arm64: mte: Use Normal Tagged attributes for the linear mapCatalin Marinas
Once user space is given access to tagged memory, the kernel must be able to clear/save/restore tags visible to the user. This is done via the linear mapping, therefore map it as such. The new MT_NORMAL_TAGGED index for MAIR_EL1 is initially mapped as Normal memory and later changed to Normal Tagged via the cpufeature infrastructure. From a mismatched attribute aliases perspective, the Tagged memory is considered a permission and it won't lead to undefined behaviour. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
2020-09-03arm64: mte: system register definitionsVincenzo Frascino
Add Memory Tagging Extension system register definitions together with the relevant bitfields. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>