Age | Commit message (Collapse) | Author |
|
* for-next/mm:
arm64: mm: always map fixmap at page granularity
arm64: mm: move fixmap code to its own file
arm64: add FIXADDR_TOT_{START,SIZE}
Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""
arm: uaccess: Remove memcpy_page_flushcache()
mm,kfence: decouple kfence from page granularity mapping judgement
|
|
After taking off the protection functions on crashkernel memory region,
there's no need to map crashkernel region with page granularity during
linear mapping.
With this change, the system can make use of block or section mapping
on linear region to largely improve perforcemence during system bootup
and running.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230407011507.17572-3-bhe@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Over time, arm64's mm/mmu.c has become increasingly large and painful to
navigate. Move the fixmap code to its own file where it can be understood in
isolation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently arm64's FIXADDR_{START,SIZE} definitions only cover the
runtime fixmap slots (and not the boot-time fixmap slots), but the code
for creating the fixmap assumes that these definitions cover the entire
fixmap range. This means that the ptdump boundaries are reported in a
misleading way, missing the VA region of the runtime slots. In theory
this could also cause the fixmap creation to go wrong if the boot-time
fixmap slots end up spilling into a separate PMD entry, though luckily
this is not currently the case in any configuration.
While it seems like we could extend FIXADDR_{START,SIZE} to cover the
entire fixmap area, core code relies upon these *only* covering the
runtime slots. For example, fix_to_virt() and virt_to_fix() try to
reject manipulation of the boot-time slots based upon
FIXADDR_{START,SIZE}, while __fix_to_virt() and __virt_to_fix() can
handle any fixmap slot.
This patch follows the lead of x86 in commit:
55f49fcb879fbeeb ("x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP")
... and add new FIXADDR_TOT_{START,SIZE} definitions which cover the
entire fixmap area, using these for the fixmap creation and ptdump code.
As the boot-time fixmap slots are now rejected by fix_to_virt(),
the early_fixmap_init() code is changed to consistently use
__fix_to_virt(), as it already does in a few cases.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Kfence only needs its pool to be mapped as page granularity, if it is
inited early. Previous judgement was a bit over protected. From [1], Mark
suggested to "just map the KFENCE region a page granularity". So I
decouple it from judgement and do page granularity mapping for kfence
pool only. Need to be noticed that late init of kfence pool still requires
page granularity mapping.
Page granularity mapping in theory cost more(2M per 1GB) memory on arm64
platform. Like what I've tested on QEMU(emulated 1GB RAM) with
gki_defconfig, also turning off rodata protection:
Before:
[root@liebao ]# cat /proc/meminfo
MemTotal: 999484 kB
After:
[root@liebao ]# cat /proc/meminfo
MemTotal: 1001480 kB
To implement this, also relocate the kfence pool allocation before the
linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys
addr, __kfence_pool is to be set after linear mapping set up.
LINK: [1] https://lore.kernel.org/linux-arm-kernel/Y+IsdrvDNILA59UN@FVFF77S0Q05N/
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
architectural register (ZT0, for the look-up table feature) that
Linux needs to save/restore
- Include TPIDR2 in the signal context and add the corresponding
kselftests
- Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
(ARM CMN) at probe time
- Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64
- Permit EFI boot with MMU and caches on. Instead of cleaning the
entire loaded kernel image to the PoC and disabling the MMU and
caches before branching to the kernel bare metal entry point, leave
the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
all of system RAM to populate the initial page tables
- Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
kernel (the arm32 kernel only defines the values)
- Harden the arm64 shadow call stack pointer handling: stash the shadow
stack pointer in the task struct on interrupt, load it directly from
this structure
- Signal handling cleanups to remove redundant validation of size
information and avoid reading the same data from userspace twice
- Refactor the hwcap macros to make use of the automatically generated
ID registers. It should make new hwcaps writing less error prone
- Further arm64 sysreg conversion and some fixes
- arm64 kselftest fixes and improvements
- Pointer authentication cleanups: don't sign leaf functions, unify
asm-arch manipulation
- Pseudo-NMI code generation optimisations
- Minor fixes for SME and TPIDR2 handling
- Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
replace strtobool() to kstrtobool() in the cpufeature.c code, apply
dynamic shadow call stack in two passes, intercept pfn changes in
set_pte_at() without the required break-before-make sequence, attempt
to dump all instructions on unhandled kernel faults
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
arm64: fix .idmap.text assertion for large kernels
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
arm64: kprobes: Drop ID map text from kprobes blacklist
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
arm64/sme: Fix __finalise_el2 SMEver check
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
arm64/signal: Only read new data when parsing the ZT context
arm64/signal: Only read new data when parsing the ZA context
arm64/signal: Only read new data when parsing the SVE context
arm64/signal: Avoid rereading context frame sizes
arm64/signal: Make interface for restore_fpsimd_context() consistent
arm64/signal: Remove redundant size validation from parse_user_sigframe()
arm64/signal: Don't redundantly verify FPSIMD magic
arm64/cpufeature: Use helper macros to specify hwcaps
arm64/cpufeature: Always use symbolic name for feature value in hwcaps
arm64/sysreg: Initial unsigned annotations for ID registers
arm64/sysreg: Initial annotation of signed ID registers
...
|
|
Changing pfn on a user page table mapped entry, without first going through
break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
to intercept such changes, via an updated pgattr_change_is_safe(). This new
check happens via __check_racy_pte_update(), which has now been renamed as
__check_safe_pte_update().
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230130121457.1607675-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.
Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230102061651.34745-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix Kconfig dependencies to re-allow the enabling of function graph
tracer and shadow call stacks at the same time.
- Revert the workaround for CPU erratum #2645198 since the CONFIG_
guards were incorrect and the code has therefore not seen any real
exposure in -next.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption"
ftrace: Allow WITH_ARGS flavour of graph tracer with shadow call stack
|
|
This reverts commit 44ecda71fd8a70185c270f5914ac563827fe1d4c.
All versions of this patch on the mailing list, including the version
that ended up getting merged, have portions of code guarded by the
non-existent CONFIG_ARM64_WORKAROUND_2645198 option. Although Anshuman
says he tested the code with some additional debug changes [1], I'm
hesitant to fix the CONFIG option and light up a bunch of code right
before I (and others) disappear for the end of year holidays, during
which time we won't be around to deal with any fallout.
So revert the change for now. We can bring back a fixed, tested version
for a later -rc when folks are thinking about things other than trees
and turkeys.
[1] https://lore.kernel.org/r/b6f61241-e436-5db1-1053-3b441080b8d6@arm.com
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20221215094811.23188-1-lukas.bulwahn@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
|
|
Generalise vmemmap_populate_hugepages() so ARM64 & X86 & LoongArch can
share its implementation.
Link: https://lkml.kernel.org/r/20221027125253.3458989-4-chenhuacai@loongson.cn
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Min Zhou <zhoumin@loongson.cn>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
* for-next/trivial:
arm64: alternatives: add __init/__initconst to some functions/variables
arm64/asm: Remove unused assembler DAIF save/restore macros
arm64/kpti: Move DAIF masking to C code
Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)"
arm64/mm: Drop unused restore_ttbr1
arm64: alternatives: make apply_alternatives_vdso() static
arm64/mm: Drop idmap_pg_end[] declaration
arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)
arm64: make is_ttbrX_addr() noinstr-safe
arm64/signal: Document our convention for choosing magic numbers
arm64: atomics: lse: remove stale dependency on JUMP_LABEL
arm64: paravirt: remove conduit check in has_pv_steal_clock
arm64: entry: Fix typo
arm64/booting: Add missing colon to FA64 entry
arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS
arm64/asm: Remove unused enable_da macro
|
|
This reverts commit 9ed2b4616d4e846ece2a04cb5007ce1d1bd9e3f3.
Nathan reports early boot failures bisected to this change which look
related to the kPTI nG repainting. In any case, consolidating the
BUG_ON()s to a single location needs more thought, so revert the change
until this is figured out properly.
Link: https://lore.kernel.org/r/Y3pS5fdZ3MdLZ00t@dev-arch.thelio-3990X
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.
Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116140915.356601-3-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
__create_pgd_mapping_locked() expects a page allocator used while mapping a
virtual range. This page allocator function propagates down the call chain,
while building intermediate levels in the page table. Passed page allocator
is a necessary ingredient required to build the page table but its presence
can be asserted just once in the very beginning rather than in all the down
stream functions. This consolidates BUG_ON(!pgtable_alloc) checks just in a
single place i.e __create_pgd_mapping_locked().
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221118053102.500216-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address. So as there is no need for kern_addr_valid(), let's
remove it.
Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently ARM64_KERNEL_USES_PMD_MAPS is an unnecessary abstraction. Kernel
mapping at PMD (aka huge page aka block) level, is only applicable with 4K
base page, which makes it 2MB aligned, a necessary requirement for linear
mapping and physical memory start address. This can be easily achieved by
directly checking against base page size itself. This drops off the macro
ARM64_KERNE_USES_PMD_MAPS which is redundant.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221108034406.2950071-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
vector granule register added to the user regs together with SVE perf
extensions documentation.
- SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
documentation to match the actual kernel behaviour (zeroing the
registers on syscall rather than "zeroed or preserved" previously).
- More conversions to automatic system registers generation.
- vDSO: use self-synchronising virtual counter access in gettimeofday()
if the architecture supports it.
- arm64 stacktrace cleanups and improvements.
- arm64 atomics improvements: always inline assembly, remove LL/SC
trampolines.
- Improve the reporting of EL1 exceptions: rework BTI and FPAC
exception handling, better EL1 undefs reporting.
- Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
result.
- arm64 defconfig updates: build CoreSight as a module, enable options
necessary for docker, memory hotplug/hotremove, enable all PMUs
provided by Arm.
- arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
extensions).
- arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
unused function.
- kselftest updates for arm64: simple HWCAP validation, FP stress test
improvements, validation of ZA regs in signal handlers, include
larger SVE and SME vector lengths in signal tests, various cleanups.
- arm64 alternatives (code patching) improvements to robustness and
consistency: replace cpucap static branches with equivalent
alternatives, associate callback alternatives with a cpucap.
- Miscellaneous updates: optimise kprobe performance of patching
single-step slots, simplify uaccess_mask_ptr(), move MTE registers
initialisation to C, support huge vmalloc() mappings, run softirqs on
the per-CPU IRQ stack, compat (arm32) misalignment fixups for
multiword accesses.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
arm64/kprobe: Optimize the performance of patching single-step slot
arm64: defconfig: Add Coresight as module
kselftest/arm64: Handle EINTR while reading data from children
kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
kselftest/arm64: Don't repeat termination handler for fp-stress
ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
arm64/mm: fold check for KFENCE into can_set_direct_map()
arm64: ftrace: fix module PLTs with mcount
arm64: module: Remove unused plt_entry_is_initialized()
arm64: module: Make plt_equals_entry() static
arm64: fix the build with binutils 2.27
kselftest/arm64: Don't enable v8.5 for MTE selftest builds
arm64: uaccess: simplify uaccess_mask_ptr()
arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
kselftest/arm64: Fix typo in hwcap check
arm64: mte: move register initialization to C
arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
arm64/sve: Add Perf extensions documentation
...
|
|
* for-next/misc:
: Miscellaneous patches
arm64/kprobe: Optimize the performance of patching single-step slot
ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
arm64/mm: fold check for KFENCE into can_set_direct_map()
arm64: uaccess: simplify uaccess_mask_ptr()
arm64: mte: move register initialization to C
arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
arm64: support huge vmalloc mappings
arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually
arm64: run softirqs on the per-CPU IRQ stack
arm64: compat: Implement misalignment fixups for multiword loads
|
|
KFENCE requires linear map to be mapped at page granularity, so that it
is possible to protect/unprotect single pages, just like with
rodata_full and DEBUG_PAGEALLOC.
Instead of repating
can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)
make can_set_direct_map() handle the KFENCE case.
This also prevents potential false positives in kernel_page_present()
that may return true for non-present page if CONFIG_KFENCE is enabled.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220921074841.382615-1-rppt@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Directly check ARM64_SWAPPER_USES_SECTION_MAPS to choose base page
or PMD level huge page mapping in vmemmap_populate() to simplify
code a bit.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220920014951.196191-1-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Since commit:
47546a1912fc4a03 ("arm64: mm: install KPTI nG mappings with MMU enabled)"
... when building with CONFIG_DEBUG_ATOMIC_SLEEP=y and booting under
QEMU TCG with '-cpu max', there's a boot-time splat:
| BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 15, name: migration/0
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| no locks held by migration/0/15.
| irq event stamp: 28
| hardirqs last enabled at (27): [<ffff8000091ed180>] _raw_spin_unlock_irq+0x3c/0x7c
| hardirqs last disabled at (28): [<ffff8000081b8d74>] multi_cpu_stop+0x150/0x18c
| softirqs last enabled at (0): [<ffff80000809a314>] copy_process+0x594/0x1964
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.0.0-rc3-00002-g419b42ff7eef #3
| Hardware name: linux,dummy-virt (DT)
| Stopper: multi_cpu_stop+0x0/0x18c <- stop_cpus.constprop.0+0xa0/0xfc
| Call trace:
| dump_backtrace.part.0+0xd0/0xe0
| show_stack+0x1c/0x5c
| dump_stack_lvl+0x88/0xb4
| dump_stack+0x1c/0x38
| __might_resched+0x180/0x230
| __might_sleep+0x4c/0xa0
| __mutex_lock+0x5c/0x450
| mutex_lock_nested+0x30/0x40
| create_kpti_ng_temp_pgd+0x4fc/0x6d0
| kpti_install_ng_mappings+0x2b8/0x3b0
| cpu_enable_non_boot_scope_capabilities+0x7c/0xd0
| multi_cpu_stop+0xa0/0x18c
| cpu_stopper_thread+0x88/0x11c
| smpboot_thread_fn+0x1ec/0x290
| kthread+0x118/0x120
| ret_from_fork+0x10/0x20
Since commit:
ee017ee353506fce ("arm64/mm: avoid fixmap race condition when create pud mapping")
... once the kernel leave the SYSTEM_BOOTING state, the fixmap pagetable
entries are protected by the fixmap_lock mutex.
The new KPTI rewrite code uses __create_pgd_mapping() to create a
temporary pagetable. This happens in atomic context, after secondary
CPUs are brought up and the kernel has left the SYSTEM_BOOTING state.
Hence we try to acquire a mutex in atomic context, which is generally
unsound (though benign in this case as the mutex should be free and all
other CPUs are quiescent).
This patch avoids the issue by pulling the mutex out of alloc_init_pud()
and calling it at a higher level in the pagetable manipulation code.
This allows it to be used without locking where one CPU is known to be
in exclusive control of the machine, even after having left the
SYSTEM_BOOTING state.
Fixes: 47546a1912fc ("arm64: mm: install KPTI nG mappings with MMU enabled")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220920134731.1625740-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Our standard is to include the _EL1 in the constant names for registers but
we did not do that for ID_AA64PFR1_EL1, update to do so in preparation for
conversion to automatic generation. No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-8-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
On arm64, "rodata=full" has been suppored (but not documented) since
commit:
c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")
As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.
Unfortunately, this split meant that since commit:
f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions")
... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).
Further, "rodata=full" has been broken since commit:
0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.
This patch fixes this breakage by:
* Moving the core parameter parser to an __early_param(), such that it
is available early.
* Adding an (optional) arch hook which arm64 can use to parse "full".
* Updating the documentation to mention that "full" is valid for arm64.
* Having the core parameter parser handle "on" and "off" explicitly,
such that any undocumented values (e.g. typos such as "ful") are
reported as errors rather than being silently accepted.
Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.
Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
* for-next/boot: (34 commits)
arm64: fix KASAN_INLINE
arm64: Add an override for ID_AA64SMFR0_EL1.FA64
arm64: Add the arm64.nosve command line option
arm64: Add the arm64.nosme command line option
arm64: Expose a __check_override primitive for oddball features
arm64: Allow the idreg override to deal with variable field width
arm64: Factor out checking of a feature against the override into a macro
arm64: Allow sticky E2H when entering EL1
arm64: Save state of HCR_EL2.E2H before switch to EL1
arm64: Rename the VHE switch to "finalise_el2"
arm64: mm: fix booting with 52-bit address space
arm64: head: remove __PHYS_OFFSET
arm64: lds: use PROVIDE instead of conditional definitions
arm64: setup: drop early FDT pointer helpers
arm64: head: avoid relocating the kernel twice for KASLR
arm64: kaslr: defer initialization to initcall where permitted
arm64: head: record CPU boot mode after enabling the MMU
arm64: head: populate kernel page tables with MMU and caches on
arm64: head: factor out TTBR1 assignment into a macro
arm64: idreg-override: use early FDT mapping in ID map
...
|
|
* for-next/misc:
arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
arm64: numa: Don't check node against MAX_NUMNODES
arm64: mm: Remove assembly DMA cache maintenance wrappers
arm64/mm: Define defer_reserve_crashkernel()
arm64: fix oops in concurrently setting insn_emulation sysctls
arm64: Do not forget syscall when starting a new thread.
arm64: boot: add zstd support
|
|
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA
or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also
impacts overall linear mapping creation including the crash kernel itself.
Just encapsulate this deferral check in a new helper for better clarity.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220705062556.1845734-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to avoid having to touch memory with the MMU and caches
disabled, and therefore having to invalidate it from the caches
explicitly, just defer storing the value until after the MMU has been
turned on, unless we are giving up with an error.
While at it, move the associated variable definitions into C code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-19-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As a first step towards avoiding the need to create, tear down and
recreate the kernel virtual mapping with MMU and caches disabled, start
by expanding the ID map so it covers the page tables as well as all
executable code. This will allow us to populate the page tables with the
MMU and caches on, and call KASLR init code before setting up the
virtual mapping.
Since this ID map is only needed at boot, create it as a temporary set
of page tables, and populate the permanent ID map after enabling the MMU
and caches. While at it, switch to read-only attributes for the where
possible, as writable permissions are only needed for the initial kernel
page tables. Note that on 4k granule configurations, the permanent ID
map will now be reduced to a single page rather than a 2M block mapping.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-13-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In preparation for changing the way we initialize the permanent ID map,
update cpu_replace_ttbr1() so we can use it with the initial ID map as
well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-11-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even
though it is updated with the MMU and caches disabled. However, we never
bother to read the value again except in the very next instruction, and
so we can just drop the variable entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220624150651.1358849-5-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Setting idmap_t0sz involves fiddling with the caches if done with the
MMU off. Since we will be creating an initial ID map with the MMU and
caches off, and the permanent ID map with the MMU and caches on, let's
move this assignment of idmap_t0sz out of the startup code, and replace
it with a macro that simply issues the three instructions needed to
calculate the value wherever it is needed before the MMU is turned on.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-4-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently, we only support 52-bit virtual addressing on 64k pages
configurations, and in all other cases, vabits_actual is guaranteed to
equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in
that case.
While at it, move the assignment out of the asm entry code - it has no
need to be there.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This variable definition does not need to be in head.S so move it out.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220624150651.1358849-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Get rid of some clunky open coded arithmetic on section addresses, by
emitting the trampoline data variables into a separate, dedicated r/o
data section, and putting it at the next page boundary. This way, we can
access the literals via single LDR instruction.
While at it, get rid of other, implicit literals, and use ADRP/ADD or
MOVZ/MOVK sequences, as appropriate. Note that the latter are only
supported for CONFIG_RELOCATABLE=n (which is usually the case if
CONFIG_RANDOMIZE_BASE=n), so update the CPP conditionals to reflect
this.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220622161010.3845775-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In cases where we unmap the kernel while running in user space, we rely
on ASIDs to distinguish the minimal trampoline from the full kernel
mapping, and this means we must use non-global attributes for those
mappings, to ensure they are scoped by ASID and will not hit in the TLB
inadvertently.
We only do this when needed, as this is generally more costly in terms
of TLB pressure, and so we boot without these non-global attributes, and
apply them to all existing kernel mappings once all CPUs are up and we
know whether or not the non-global attributes are needed. At this point,
we cannot simply unmap and remap the entire address space, so we have to
update all existing block and page descriptors in place.
Currently, we go through a lot of trouble to perform these updates with
the MMU and caches off, to avoid violating break before make (BBM) rules
imposed by the architecture. Since we make changes to page tables that
are not covered by the ID map, we gain access to those descriptors by
disabling translations altogether. This means that the stores to memory
are issued with device attributes, and require extra care in terms of
coherency, which is costly. We also rely on the ID map to access a
shared flag, which requires the ID map to be executable and writable at
the same time, which is another thing we'd prefer to avoid.
So let's switch to an approach where we replace the kernel mapping with
a minimal mapping of a few pages that can be used for a minimal, ad-hoc
fixmap that we can use to map each page table in turn as we traverse the
hierarchy.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220609174320.4035379-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
|
|
Merge in the latest Spectre mess to fix up conflicts with what was
already queued for 5.18 when the embargo finally lifted.
* for-next/spectre-bhb: (21 commits)
arm64: Do not include __READ_ONCE() block in assembly files
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
arm64: Use the clearbhb instruction in mitigations
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
arm64: Mitigate spectre style branch history side channels
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
arm64: Add percpu vectors for EL1
arm64: entry: Add macro for reading symbol addresses from the trampoline
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Make the trampoline cleanup optional
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
...
|
|
* for-next/mm:
Documentation: vmcoreinfo: Fix htmldocs warning
arm64/mm: Drop use_1G_block()
arm64: avoid flushing icache multiple times on contiguous HugeTLB
arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges
arm64/hugetlb: Define __hugetlb_valid_size()
arm64/mm: avoid fixmap race condition when create pud mapping
arm64/mm: Consolidate TCR_EL1 fields
|
|
The following patches resulted in deferring crash kernel reservation to
mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU),
in particular Raspberry Pi 4.
commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
commit 8424ecdde7df ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges")
commit 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
commit 2687275a5843 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required")
Above changes introduced boot slowdown due to linear map creation for
all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1]. The proposed
changes restore crash kernel reservation to earlier behavior thus avoids
slow boot, particularly for platforms with IOMMU (no DMA memory zones).
Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU
and 8GB memory. Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm
no regression to deferring scheme of crash kernel memory reservation.
In both cases successfully collected kernel crash dump.
[1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com
[will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pud_sect_supported() already checks for PUD level block mapping support i.e
on ARM64_4K_PAGES config. Hence pud_sect_supported(), along with some other
required alignment checks can help completely drop use_1G_block().
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1644988012-25455-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.
Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
Adding a second set of vectors to .entry.tramp.text will make it
larger than a single 4K page.
Allow the trampoline text to occupy up to three pages by adding two
more fixmap slots. Previous changes to tramp_valias allowed it to reach
beyond a single page.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
|
|
The 'fixmap' is a global resource and is used recursively by
create pud mapping(), leading to a potential race condition in the
presence of a concurrent call to alloc_init_pud():
kernel_init thread virtio-mem workqueue thread
================== ===========================
alloc_init_pud(...) alloc_init_pud(...)
pudp = pud_set_fixmap_offset(...) pudp = pud_set_fixmap_offset(...)
READ_ONCE(*pudp)
pud_clear_fixmap(...)
READ_ONCE(*pudp) // CRASH!
As kernel may sleep during creating pud mapping, introduce a mutex lock to
serialise use of the fixmap entries by alloc_init_pud(). However, there is
no need for locking in early boot stage and it doesn't work well with
KASLR enabled when early boot. So, enable lock when system_state doesn't
equal to "SYSTEM_BOOTING".
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: f4710445458c ("arm64: mm: use fixmap when creating page tables")
Link: https://lore.kernel.org/r/20220201114400.56885-1-jianyong.wu@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix double-evaluation of 'pte' macro argument when using 52-bit PAs
- Fix signedness of some MTE prctl PR_* constants
- Fix kmemleak memory usage by skipping early pgtable allocations
- Fix printing of CPU feature register strings
- Remove redundant -nostdlib linker flag for vDSO binaries
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
arm64: Track no early_pgtable_alloc() for kmemleak
arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long
arm64: vdso: remove -nostdlib compiler flag
arm64: arm64_ftr_reg->name may not be a human-readable string
|
|
After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:
kmemleak_alloc_phys()
memblock_alloc_range_nid()
memblock_phys_alloc_range()
early_pgtable_alloc()
init_pmd()
alloc_init_pud()
__create_pgd_mapping()
__map_memblock()
paging_init()
setup_arch()
start_kernel()
Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c785270
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Merge misc updates from Andrew Morton:
"257 patches.
Subsystems affected by this patch series: scripts, ocfs2, vfs, and
mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
cleanups, kfence, and damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
mm/damon: remove return value from before_terminate callback
mm/damon: fix a few spelling mistakes in comments and a pr_debug message
mm/damon: simplify stop mechanism
Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Docs/admin-guide/mm/damon/start: simplify the content
Docs/admin-guide/mm/damon/start: fix a wrong link
Docs/admin-guide/mm/damon/start: fix wrong example commands
mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
mm/damon: remove unnecessary variable initialization
Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
selftests/damon: support watermarks
mm/damon/dbgfs: support watermarks
mm/damon/schemes: activate schemes based on a watermarks mechanism
tools/selftests/damon: update for regions prioritization of schemes
mm/damon/dbgfs: support prioritization weights
mm/damon/vaddr,paddr: support pageout prioritization
mm/damon/schemes: prioritize regions within the quotas
mm/damon/selftests: support schemes quotas
mm/damon/dbgfs: support quotas of schemes
...
|
|
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().
The callers are updated with the below semantic patch:
@@
expression addr;
expression size;
@@
- memblock_free(addr, size);
+ memblock_phys_free(addr, size);
Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After new memory blocks have been hotplugged, max_pfn and max_low_pfn
needs updating to reflect on new PFNs being hot added to system.
Without this patch, debug-related functions that use max_pfn such as
get_max_dump_pfn() or read_page_owner() will not work with any page in
memory that is hot-added after boot.
Fixes: 4ab215061554 ("arm64: Add memory hotplug support")
Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Georgi Djakov <quic_c_gdjako@quicinc.com>
Tested-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/a51a27ee7be66024b5ce626310d673f24107bcb8.1632853776.git.quic_cgoldswo@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|