From df9d7a22dd21c926e8175ccc6e176cb45fc7cb09 Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Fri, 6 Sep 2019 10:52:39 +0100 Subject: arm64: mte: Add Memory Tagging Extension documentation Memory Tagging Extension (part of the ARMv8.5 Extensions) provides a mechanism to detect the sources of memory related errors which may be vulnerable to exploitation, including bounds violations, use-after-free, use-after-return, use-out-of-scope and use before initialization errors. Add Memory Tagging Extension documentation for the arm64 linux kernel support. Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas Acked-by: Szabolcs Nagy Cc: Will Deacon --- Documentation/arm64/cpu-feature-registers.rst | 2 + Documentation/arm64/elf_hwcaps.rst | 4 + Documentation/arm64/index.rst | 1 + Documentation/arm64/memory-tagging-extension.rst | 305 +++++++++++++++++++++++ 4 files changed, 312 insertions(+) create mode 100644 Documentation/arm64/memory-tagging-extension.rst (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst index f28853f80089..328e0c454fbd 100644 --- a/Documentation/arm64/cpu-feature-registers.rst +++ b/Documentation/arm64/cpu-feature-registers.rst @@ -175,6 +175,8 @@ infrastructure: +------------------------------+---------+---------+ | Name | bits | visible | +------------------------------+---------+---------+ + | MTE | [11-8] | y | + +------------------------------+---------+---------+ | SSBS | [7-4] | y | +------------------------------+---------+---------+ | BT | [3-0] | y | diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst index 84a9fd2d41b4..bbd9cf54db6c 100644 --- a/Documentation/arm64/elf_hwcaps.rst +++ b/Documentation/arm64/elf_hwcaps.rst @@ -240,6 +240,10 @@ HWCAP2_BTI Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001. +HWCAP2_MTE + + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described + by Documentation/arm64/memory-tagging-extension.rst. 4. Unused AT_HWCAP bits ----------------------- diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst index d9665d83c53a..43b0939d384e 100644 --- a/Documentation/arm64/index.rst +++ b/Documentation/arm64/index.rst @@ -14,6 +14,7 @@ ARM64 Architecture hugetlbpage legacy_instructions memory + memory-tagging-extension perf pointer-authentication silicon-errata diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst new file mode 100644 index 000000000000..e3709b536b89 --- /dev/null +++ b/Documentation/arm64/memory-tagging-extension.rst @@ -0,0 +1,305 @@ +=============================================== +Memory Tagging Extension (MTE) in AArch64 Linux +=============================================== + +Authors: Vincenzo Frascino + Catalin Marinas + +Date: 2020-02-25 + +This document describes the provision of the Memory Tagging Extension +functionality in AArch64 Linux. + +Introduction +============ + +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI +(Top Byte Ignore) feature and allows software to access a 4-bit +allocation tag for each 16-byte granule in the physical address space. +Such memory range must be mapped with the Normal-Tagged memory +attribute. A logical tag is derived from bits 59-56 of the virtual +address used for the memory access. A CPU with MTE enabled will compare +the logical tag against the allocation tag and potentially raise an +exception on mismatch, subject to system registers configuration. + +Userspace Support +================= + +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is +supported by the hardware, the kernel advertises the feature to +userspace via ``HWCAP2_MTE``. + +PROT_MTE +-------- + +To access the allocation tags, a user process must enable the Tagged +memory attribute on an address range using a new ``prot`` flag for +``mmap()`` and ``mprotect()``: + +``PROT_MTE`` - Pages allow access to the MTE allocation tags. + +The allocation tag is set to 0 when such pages are first mapped in the +user address space and preserved on copy-on-write. ``MAP_SHARED`` is +supported and the allocation tags can be shared between processes. + +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other +types of mapping will result in ``-EINVAL`` returned by these system +calls. + +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot +be cleared by ``mprotect()``. + +**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and +``MADV_FREE`` may have the allocation tags cleared (set to 0) at any +point after the system call. + +Tag Check Faults +---------------- + +When ``PROT_MTE`` is enabled on an address range and a mismatch between +the logical and allocation tags occurs on access, there are three +configurable behaviours: + +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the + tag check fault. + +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with + ``.si_code = SEGV_MTESERR`` and ``.si_addr = ``. The + memory access is not performed. If ``SIGSEGV`` is ignored or blocked + by the offending thread, the containing process is terminated with a + ``coredump``. + +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending + thread, asynchronously following one or multiple tag check faults, + with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting + address is unknown). + +The user can select the above modes, per thread, using the +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK`` +bit-field: + +- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults +- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode + +The current tag check fault mode can be read using the +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. + +Tag checking can also be disabled for a user thread by setting the +``PSTATE.TCO`` bit with ``MSR TCO, #1``. + +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, +irrespective of the interrupted context. ``PSTATE.TCO`` is restored on +``sigreturn()``. + +**Note**: There are no *match-all* logical tags available for user +applications. + +**Note**: Kernel accesses to the user address space (e.g. ``read()`` +system call) are not checked if the user thread tag checking mode is +``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is +``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user +address accesses, however it cannot always guarantee it. + +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions +----------------------------------------------------------------- + +The architecture allows excluding certain tags to be randomly generated +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux +excludes all tags other than 0. A user thread can enable specific tags +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap +in the ``PR_MTE_TAG_MASK`` bit-field. + +**Note**: The hardware uses an exclude mask but the ``prctl()`` +interface provides an include mask. An include mask of ``0`` (exclusion +mask ``0xffff``) results in the CPU always generating tag ``0``. + +Initial process state +--------------------- + +On ``execve()``, the new process has the following configuration: + +- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) +- Tag checking mode set to ``PR_MTE_TCF_NONE`` +- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) +- ``PSTATE.TCO`` set to 0 +- ``PROT_MTE`` not set on any of the initial memory maps + +On ``fork()``, the new process inherits the parent's configuration and +memory map attributes with the exception of the ``madvise()`` ranges +with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set +to 0). + +The ``ptrace()`` interface +-------------------------- + +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read +the tags from or set the tags to a tracee's address space. The +``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, +data)`` where: + +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``. +- ``pid`` - the tracee's PID. +- ``addr`` - address in the tracee's address space. +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to + a buffer of ``iov_len`` length in the tracer's address space. + +The tags in the tracer's ``iov_base`` buffer are represented as one +4-bit tag per byte and correspond to a 16-byte MTE tag granule in the +tracee's address space. + +**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel +will use the corresponding aligned address. + +``ptrace()`` return value: + +- 0 - tags were copied, the tracer's ``iov_len`` was updated to the + number of tags transferred. This may be smaller than the requested + ``iov_len`` if the requested address range in the tracee's or the + tracer's space cannot be accessed or does not have valid tags. +- ``-EPERM`` - the specified process cannot be traced. +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid + address) and no tags copied. ``iov_len`` not updated. +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` + or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. +- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never + mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. + +**Note**: There are no transient errors for the requests above, so user +programs should not retry in case of a non-zero system call return. + +``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == +``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged +address ABI control and MTE configuration of a process as per the +``prctl()`` options described in +Documentation/arm64/tagged-address-abi.rst and above. The corresponding +``regset`` is 1 element of 8 bytes (``sizeof(long))``). + +Example of correct usage +======================== + +*MTE Example code* + +.. code-block:: c + + /* + * To be compiled with -march=armv8.5-a+memtag + */ + #include + #include + #include + #include + #include + #include + #include + #include + + /* + * From arch/arm64/include/uapi/asm/hwcap.h + */ + #define HWCAP2_MTE (1 << 18) + + /* + * From arch/arm64/include/uapi/asm/mman.h + */ + #define PROT_MTE 0x20 + + /* + * From include/uapi/linux/prctl.h + */ + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_GET_TAGGED_ADDR_CTRL 56 + # define PR_TAGGED_ADDR_ENABLE (1UL << 0) + # define PR_MTE_TCF_SHIFT 1 + # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TAG_SHIFT 3 + # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) + + /* + * Insert a random logical tag into the given pointer. + */ + #define insert_random_tag(ptr) ({ \ + uint64_t __val; \ + asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ + __val; \ + }) + + /* + * Set the allocation tag on the destination address. + */ + #define set_tag(tagged_addr) do { \ + asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ + } while (0) + + int main() + { + unsigned char *a; + unsigned long page_sz = sysconf(_SC_PAGESIZE); + unsigned long hwcap2 = getauxval(AT_HWCAP2); + + /* check if MTE is present */ + if (!(hwcap2 & HWCAP2_MTE)) + return EXIT_FAILURE; + + /* + * Enable the tagged address ABI, synchronous MTE tag check faults and + * allow all non-zero tags in the randomly generated set. + */ + if (prctl(PR_SET_TAGGED_ADDR_CTRL, + PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT), + 0, 0, 0)) { + perror("prctl() failed"); + return EXIT_FAILURE; + } + + a = mmap(0, page_sz, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (a == MAP_FAILED) { + perror("mmap() failed"); + return EXIT_FAILURE; + } + + /* + * Enable MTE on the above anonymous mmap. The flag could be passed + * directly to mmap() and skip this step. + */ + if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { + perror("mprotect() failed"); + return EXIT_FAILURE; + } + + /* access with the default tag (0) */ + a[0] = 1; + a[1] = 2; + + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* set the logical and allocation tags */ + a = (unsigned char *)insert_random_tag(a); + set_tag(a); + + printf("%p\n", a); + + /* non-zero tag access */ + a[0] = 3; + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* + * If MTE is enabled correctly the next instruction will generate an + * exception. + */ + printf("Expecting SIGSEGV...\n"); + a[16] = 0xdd; + + /* this should not be printed in the PR_MTE_TCF_SYNC mode */ + printf("...haven't got one\n"); + + return EXIT_FAILURE; + } -- cgit From 7a5d265b68e10bd99909434eedf08ca79ab2e640 Mon Sep 17 00:00:00 2001 From: Bailu Lin Date: Fri, 25 Sep 2020 19:25:58 -0700 Subject: doc: zh_CN: index files in arm64 subdirectory Add arm64 subdirectory into the table of Contents for zh_CN, then add other translations in arm64 conveniently. Signed-off-by: Bailu Lin Reviewed-by: Alex Shi Link: https://lore.kernel.org/r/20200926022558.46232-1-bailu.lin@vivo.com Signed-off-by: Jonathan Corbet --- Documentation/arm64/index.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst index d9665d83c53a..3ebe0fa31948 100644 --- a/Documentation/arm64/index.rst +++ b/Documentation/arm64/index.rst @@ -1,3 +1,5 @@ +.. _arm64_index: + ================== ARM64 Architecture ================== -- cgit From a0eef4a8acbb57f3ca307dcd127dcc0c2e806981 Mon Sep 17 00:00:00 2001 From: Bailu Lin Date: Fri, 25 Sep 2020 19:52:33 -0700 Subject: Documentation: Chinese translation of Documentation/arm64/amu.rst This is a Chinese translated version of Documentation/arm64/amu.rst Signed-off-by: Bailu Lin Reviewed-by: Alex Shi Link: https://lore.kernel.org/r/20200926025233.47214-1-bailu.lin@vivo.com Signed-off-by: Jonathan Corbet --- Documentation/arm64/amu.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst index 452ec8b115c2..01f2de2b0450 100644 --- a/Documentation/arm64/amu.rst +++ b/Documentation/arm64/amu.rst @@ -1,3 +1,5 @@ +.. _amu_index: + ======================================================= Activity Monitors Unit (AMU) extension in AArch64 Linux ======================================================= -- cgit From b5756146db3ad57a9c0e841ea01ce915db27b7de Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 28 Sep 2020 23:02:19 +0100 Subject: arm64: mte: Fix typo in memory tagging ABI documentation We offer both PTRACE_PEEKMTETAGS and PTRACE_POKEMTETAGS requests via ptrace(). Reported-by: Andrey Konovalov Signed-off-by: Will Deacon --- Documentation/arm64/memory-tagging-extension.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst index e3709b536b89..034d37c605e8 100644 --- a/Documentation/arm64/memory-tagging-extension.rst +++ b/Documentation/arm64/memory-tagging-extension.rst @@ -142,7 +142,7 @@ the tags from or set the tags to a tracee's address space. The ``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, data)`` where: -- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``. +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. - ``pid`` - the tracee's PID. - ``addr`` - address in the tracee's address space. - ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to -- cgit From e0533dee522593c25a88b63bf730b2096f6d4122 Mon Sep 17 00:00:00 2001 From: Bailu Lin Date: Tue, 13 Oct 2020 19:20:03 -0700 Subject: Documentation: Chinese translation of Documentation/arm64/hugetlbpage.rst This is a Chinese translated version of Documentation/arm64/hugetlbpage.rst Signed-off-by: Bailu Lin Reviewed-by: Alex Shi Link: https://lore.kernel.org/r/20201014022003.43862-1-bailu.lin@vivo.com Signed-off-by: Jonathan Corbet --- Documentation/arm64/hugetlbpage.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/hugetlbpage.rst b/Documentation/arm64/hugetlbpage.rst index b44f939e5210..a110124c11e3 100644 --- a/Documentation/arm64/hugetlbpage.rst +++ b/Documentation/arm64/hugetlbpage.rst @@ -1,3 +1,5 @@ +.. _hugetlbpage_index: + ==================== HugeTLBpage on ARM64 ==================== -- cgit From ef5dd6a0c828b6fbd9d595e5772fcb51ff86697e Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Wed, 28 Oct 2020 14:55:24 +0000 Subject: arm64: mte: Document that user PSTATE.TCO is ignored by kernel uaccess On exception entry, the kernel explicitly resets the PSTATE.TCO (tag check override) so that any kernel memory accesses will be checked (the bit is restored on exception return). This has the side-effect that the uaccess routines will not honour the PSTATE.TCO that may have been set by the user prior to a syscall. There is no issue in practice since PSTATE.TCO is expected to be used only for brief periods in specific routines (e.g. garbage collection). To control the tag checking mode of the uaccess routines, the user will have to invoke a corresponding prctl() call. Document the kernel behaviour w.r.t. PSTATE.TCO accordingly. Signed-off-by: Catalin Marinas Fixes: df9d7a22dd21 ("arm64: mte: Add Memory Tagging Extension documentation") Reviewed-by: Vincenzo Frascino Cc: Will Deacon Cc: Szabolcs Nagy Signed-off-by: Will Deacon --- Documentation/arm64/memory-tagging-extension.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst index 034d37c605e8..b540178a93f8 100644 --- a/Documentation/arm64/memory-tagging-extension.rst +++ b/Documentation/arm64/memory-tagging-extension.rst @@ -102,7 +102,9 @@ applications. system call) are not checked if the user thread tag checking mode is ``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is ``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user -address accesses, however it cannot always guarantee it. +address accesses, however it cannot always guarantee it. Kernel accesses +to user addresses are always performed with an effective ``PSTATE.TCO`` +value of zero, regardless of the user configuration. Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions ----------------------------------------------------------------- -- cgit From 96d389ca10110d7eefb46feb6af9a0c6832f78f5 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Wed, 28 Oct 2020 13:28:39 -0500 Subject: arm64: Add workaround for Arm Cortex-A77 erratum 1508412 On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring Reviewed-by: Catalin Marinas Acked-by: Marc Zyngier Cc: Catalin Marinas Cc: James Morse Cc: Suzuki K Poulose Cc: Will Deacon Cc: Julien Thierry Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon --- Documentation/arm64/silicon-errata.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index d3587805de64..719510247292 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -90,6 +90,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1349291 | N/A | -- cgit From f4693c2716b35d0846fd45a4ad7db78bfb25efc8 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 8 Oct 2020 17:36:00 +0200 Subject: arm64: mm: extend linear region for 52-bit VA configurations For historical reasons, the arm64 kernel VA space is configured as two equally sized halves, i.e., on a 48-bit VA build, the VA space is split into a 47-bit vmalloc region and a 47-bit linear region. When support for 52-bit virtual addressing was added, this equal split was kept, resulting in a substantial waste of virtual address space in the linear region: 48-bit VA 52-bit VA 0xffff_ffff_ffff_ffff +-------------+ +-------------+ | vmalloc | | vmalloc | 0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+ | linear | : : 0xffff_0000_0000_0000 +-------------+ : : : : : : : : : : : : : : : : : currently : : unusable : : : : : : unused : : by : : : : : : : : hardware : : : : : : : 0xfff8_0000_0000_0000 : : _PAGE_END(52) +-------------+ : : | | : : | | : : | | : : | | : : | | : unusable : | | : : | linear | : by : | | : : | region | : hardware : | | : : | | : : | | : : | | : : | | : : | | : : | | 0xfff0_0000_0000_0000 +-------------+ PAGE_OFFSET +-------------+ As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc space (as before), to ensure that a single 64k granule kernel image can support any 64k granule capable system, regardless of whether it supports the 52-bit virtual addressing extension. However, due to the fact that the VA space is still split in equal halves, the linear region is only 2^51 bytes in size, wasting almost half of the 52-bit VA space. Let's fix this, by abandoning the equal split, and simply assigning all VA space outside of the vmalloc region to the linear region. The KASAN shadow region is reconfigured so that it ends at the start of the vmalloc region, and grows downwards. That way, the arrangement of the vmalloc space (which contains kernel mappings, modules, BPF region, the vmemmap array etc) is identical between non-KASAN and KASAN builds, which aids debugging. Signed-off-by: Ard Biesheuvel Reviewed-by: Steve Capper Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.org Signed-off-by: Catalin Marinas --- Documentation/arm64/kasan-offsets.sh | 3 +-- Documentation/arm64/memory.rst | 19 +++++++++---------- 2 files changed, 10 insertions(+), 12 deletions(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/kasan-offsets.sh b/Documentation/arm64/kasan-offsets.sh index 2b7a021db363..2dc5f9e18039 100644 --- a/Documentation/arm64/kasan-offsets.sh +++ b/Documentation/arm64/kasan-offsets.sh @@ -1,12 +1,11 @@ #!/bin/sh # Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW -# start address at the mid-point of the kernel VA space +# start address at the top of the linear region print_kasan_offset () { printf "%02d\t" $1 printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \ - + (1 << ($1 - 32 - $2)) \ - (1 << (64 - 32 - $2)) )) } diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index cf03b3290800..ee51eb66a578 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -32,10 +32,10 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: ----------------------------------------------------------------------- 0000000000000000 0000ffffffffffff 256TB user ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map - ffff800000000000 ffff9fffffffffff 32TB kasan shadow region - ffffa00000000000 ffffa00007ffffff 128MB bpf jit region - ffffa00008000000 ffffa0000fffffff 128MB modules - ffffa00010000000 fffffdffbffeffff ~93TB vmalloc +[ ffff600000000000 ffff7fffffffffff ] 32TB [ kasan shadow region ] + ffff800000000000 ffff800007ffffff 128MB bpf jit region + ffff800008000000 ffff80000fffffff 128MB modules + ffff800010000000 fffffdffbffeffff 125TB vmalloc fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region] fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings fffffdfffea00000 fffffdfffebfffff 2MB [guard region] @@ -50,12 +50,11 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support): Start End Size Use ----------------------------------------------------------------------- 0000000000000000 000fffffffffffff 4PB user - fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map - fff8000000000000 fffd9fffffffffff 1440TB [gap] - fffda00000000000 ffff9fffffffffff 512TB kasan shadow region - ffffa00000000000 ffffa00007ffffff 128MB bpf jit region - ffffa00008000000 ffffa0000fffffff 128MB modules - ffffa00010000000 fffff81ffffeffff ~88TB vmalloc + fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map +[ fffd800000000000 ffff7fffffffffff ] 512TB [ kasan shadow region ] + ffff800000000000 ffff800007ffffff 128MB bpf jit region + ffff800008000000 ffff80000fffffff 128MB modules + ffff800010000000 fffff81ffffeffff 120TB vmalloc fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region] fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region] -- cgit From 8c96400d6a39be763130a5c493647c57726f7013 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 8 Oct 2020 17:36:01 +0200 Subject: arm64: mm: make vmemmap region a projection of the linear region Now that we have reverted the introduction of the vmemmap struct page pointer and the separate physvirt_offset, we can simplify things further, and place the vmemmap region in the VA space in such a way that virtual to page translations and vice versa can be implemented using a single arithmetic shift. One happy coincidence resulting from this is that the 48-bit/4k and 52-bit/64k configurations (which are assumed to be the two most prevalent) end up with the same placement of the vmemmap region. In a subsequent patch, we will take advantage of this, and unify the memory maps even more. Signed-off-by: Ard Biesheuvel Reviewed-by: Steve Capper Link: https://lore.kernel.org/r/20201008153602.9467-4-ardb@kernel.org Signed-off-by: Catalin Marinas --- Documentation/arm64/memory.rst | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index ee51eb66a578..476edb6015b2 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -35,14 +35,14 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: [ ffff600000000000 ffff7fffffffffff ] 32TB [ kasan shadow region ] ffff800000000000 ffff800007ffffff 128MB bpf jit region ffff800008000000 ffff80000fffffff 128MB modules - ffff800010000000 fffffdffbffeffff 125TB vmalloc - fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region] - fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings - fffffdfffea00000 fffffdfffebfffff 2MB [guard region] - fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space - fffffdffffc00000 fffffdffffdfffff 2MB [guard region] - fffffdffffe00000 ffffffffffdfffff 2TB vmemmap - ffffffffffe00000 ffffffffffffffff 2MB [guard region] + ffff800010000000 fffffbffbffeffff 123TB vmalloc + fffffbffbfff0000 fffffbfffe7f8fff ~998MB [guard region] + fffffbfffe7f9000 fffffbfffebfffff 4124KB fixed mappings + fffffbfffec00000 fffffbfffedfffff 2MB [guard region] + fffffbfffee00000 fffffbffffdfffff 16MB PCI I/O space + fffffbffffe00000 fffffbffffffffff 2MB [guard region] + fffffc0000000000 fffffdffffffffff 2TB vmemmap + fffffe0000000000 ffffffffffffffff 2TB [guard region] AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: @@ -55,13 +55,13 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support): ffff800000000000 ffff800007ffffff 128MB bpf jit region ffff800008000000 ffff80000fffffff 128MB modules ffff800010000000 fffff81ffffeffff 120TB vmalloc - fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region] - fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings - fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region] - fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space - fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region] - fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap - ffffffffffe00000 ffffffffffffffff 2MB [guard region] + fffff81fffff0000 fffffbfffe38ffff ~3TB [guard region] + fffffbfffe390000 fffffbfffebfffff 4544KB fixed mappings + fffffbfffec00000 fffffbfffedfffff 2MB [guard region] + fffffbfffee00000 fffffbffffdfffff 16MB PCI I/O space + fffffbffffe00000 fffffbffffffffff 2MB [guard region] + fffffc0000000000 ffffffdfffffffff ~4TB vmemmap + ffffffe000000000 ffffffffffffffff 128GB [guard region] Translation table lookup with 4KB pages:: -- cgit From 9ad7c6d5e75b160c9ce5775db610d964af45b83f Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 8 Oct 2020 17:36:02 +0200 Subject: arm64: mm: tidy up top of kernel VA space Tidy up the way the top of the kernel VA space is organized, by mirroring the 256 MB region we have below the vmalloc space, and populating it top down with the PCI I/O space, some guard regions, and the fixmap region. The latter region is itself populated top down, and today only covers about 4 MB, and so 224 MB is ample, and no guard region is therefore required. The resulting layout is identical between 48-bit/4k and 52-bit/64k configurations. Signed-off-by: Ard Biesheuvel Reviewed-by: Steve Capper Link: https://lore.kernel.org/r/20201008153602.9467-5-ardb@kernel.org Signed-off-by: Catalin Marinas --- Documentation/arm64/memory.rst | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index 476edb6015b2..3d62604fa7bd 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -35,12 +35,11 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: [ ffff600000000000 ffff7fffffffffff ] 32TB [ kasan shadow region ] ffff800000000000 ffff800007ffffff 128MB bpf jit region ffff800008000000 ffff80000fffffff 128MB modules - ffff800010000000 fffffbffbffeffff 123TB vmalloc - fffffbffbfff0000 fffffbfffe7f8fff ~998MB [guard region] - fffffbfffe7f9000 fffffbfffebfffff 4124KB fixed mappings - fffffbfffec00000 fffffbfffedfffff 2MB [guard region] - fffffbfffee00000 fffffbffffdfffff 16MB PCI I/O space - fffffbffffe00000 fffffbffffffffff 2MB [guard region] + ffff800010000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] fffffc0000000000 fffffdffffffffff 2TB vmemmap fffffe0000000000 ffffffffffffffff 2TB [guard region] @@ -54,12 +53,11 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support): [ fffd800000000000 ffff7fffffffffff ] 512TB [ kasan shadow region ] ffff800000000000 ffff800007ffffff 128MB bpf jit region ffff800008000000 ffff80000fffffff 128MB modules - ffff800010000000 fffff81ffffeffff 120TB vmalloc - fffff81fffff0000 fffffbfffe38ffff ~3TB [guard region] - fffffbfffe390000 fffffbfffebfffff 4544KB fixed mappings - fffffbfffec00000 fffffbfffedfffff 2MB [guard region] - fffffbfffee00000 fffffbffffdfffff 16MB PCI I/O space - fffffbffffe00000 fffffbffffffffff 2MB [guard region] + ffff800010000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] fffffc0000000000 ffffffdfffffffff ~4TB vmemmap ffffffe000000000 ffffffffffffffff 128GB [guard region] -- cgit From 68af6d2483dbd0385317bc87a338b155be75eeb6 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 10 Nov 2020 14:08:51 +0100 Subject: Documentation/arm64: fix RST layout of memory.rst Stephen reports that commit f4693c2716b3 ("arm64: mm: extend linear region for 52-bit VA configurations") triggers the following warnings when building the htmldocs make target of today's linux-next: Documentation/arm64/memory.rst:35: WARNING: Literal block ends without a blank line; unexpected unindent. Documentation/arm64/memory.rst:53: WARNING: Literal block ends without a blank line; unexpected unindent. Let's tweak the memory layout table to work around this. Reported-by: Stephen Rothwell Signed-off-by: Ard Biesheuvel Fixes: f4693c2716b3 ("arm64: mm: extend linear region for 52-bit VA configurations") Link: https://lore.kernel.org/r/20201110130851.15751-1-ardb@kernel.org Signed-off-by: Catalin Marinas --- Documentation/arm64/memory.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index 3d62604fa7bd..e7522e5c8322 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -32,7 +32,7 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: ----------------------------------------------------------------------- 0000000000000000 0000ffffffffffff 256TB user ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map -[ ffff600000000000 ffff7fffffffffff ] 32TB [ kasan shadow region ] + [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] ffff800000000000 ffff800007ffffff 128MB bpf jit region ffff800008000000 ffff80000fffffff 128MB modules ffff800010000000 fffffbffefffffff 124TB vmalloc @@ -50,7 +50,7 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support): ----------------------------------------------------------------------- 0000000000000000 000fffffffffffff 4PB user fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map -[ fffd800000000000 ffff7fffffffffff ] 512TB [ kasan shadow region ] + [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] ffff800000000000 ffff800007ffffff 128MB bpf jit region ffff800008000000 ffff80000fffffff 128MB modules ffff800010000000 fffffbffefffffff 124TB vmalloc -- cgit From dceec3ff78076757311d92a388d50d0251fb7dbb Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Fri, 20 Nov 2020 12:33:46 -0800 Subject: arm64: expose FAR_EL1 tag bits in siginfo The kernel currently clears the tag bits (i.e. bits 56-63) in the fault address exposed via siginfo.si_addr and sigcontext.fault_address. However, the tag bits may be needed by tools in order to accurately diagnose memory errors, such as HWASan [1] or future tools based on the Memory Tagging Extension (MTE). Expose these bits via the arch_untagged_si_addr mechanism, so that they are only exposed to signal handlers with the SA_EXPOSE_TAGBITS flag set. [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html Signed-off-by: Peter Collingbourne Reviewed-by: Catalin Marinas Link: https://linux-review.googlesource.com/id/Ia8876bad8c798e0a32df7c2ce1256c4771c81446 Link: https://lore.kernel.org/r/0010296597784267472fa13b39f8238d87a72cf8.1605904350.git.pcc@google.com Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-pointers.rst | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) (limited to 'Documentation/arm64') diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index eab4323609b9..19d284b70384 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -53,12 +53,25 @@ visibility. Preserving tags --------------- -Non-zero tags are not preserved when delivering signals. This means that -signal handlers in applications making use of tags cannot rely on the -tag information for user virtual addresses being maintained for fields -inside siginfo_t. One exception to this rule is for signals raised in -response to watchpoint debug exceptions, where the tag information will -be preserved. +When delivering signals, non-zero tags are not preserved in +siginfo.si_addr unless the flag SA_EXPOSE_TAGBITS was set in +sigaction.sa_flags when the signal handler was installed. This means +that signal handlers in applications making use of tags cannot rely +on the tag information for user virtual addresses being maintained +in these fields unless the flag was set. + +Due to architecture limitations, bits 63:60 of the fault address +are not preserved in response to synchronous tag check faults +(SEGV_MTESERR) even if SA_EXPOSE_TAGBITS was set. Applications should +treat the values of these bits as undefined in order to accommodate +future architecture revisions which may preserve the bits. + +For signals raised in response to watchpoint debug exceptions, the +tag information will be preserved regardless of the SA_EXPOSE_TAGBITS +flag setting. + +Non-zero tags are never preserved in sigcontext.fault_address +regardless of the SA_EXPOSE_TAGBITS flag setting. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. -- cgit