summaryrefslogtreecommitdiff
path: root/lib/bitmap-str.c
diff options
context:
space:
mode:
authorDev Jain <dev.jain@arm.com>2025-09-22 12:11:26 +0530
committerWill Deacon <will@kernel.org>2025-09-22 11:53:24 +0100
commitfa93b45fd397e25265ff618de26dd5c74ee403d3 (patch)
treed203a10ba17f9a566161df4d6080343d8542c278 /lib/bitmap-str.c
parent3df6979d222b8638a60aa6921b73cb5f3e4f5dcb (diff)
arm64: Enable vmalloc-huge with ptdump
Our goal is to move towards enabling vmalloc-huge by default on arm64 so as to reduce TLB pressure. Therefore, we need a way to analyze the portion of block mappings in vmalloc space we can get on a production system; this can be done through ptdump, but currently we disable vmalloc-huge if CONFIG_PTDUMP_DEBUGFS is on. The reason is that lazy freeing of kernel pagetables via vmap_try_huge_pxd() may race with ptdump, so ptdump may dereference a bogus address. To solve this, we need to synchronize ptdump_walk() and ptdump_check_wx() with pud_free_pmd_page() and pmd_free_pte_page(). Since this race is very unlikely to happen in practice, we do not want to penalize the vmalloc pagetable tearing path by taking the init_mm mmap_lock. Therefore, we use static keys. ptdump_walk() and ptdump_check_wx() are the pagetable walkers; they will enable the static key - upon observing that, the vmalloc pagetable tearing path will get patched in with an mmap_read_lock/unlock sequence. A combination of the patched-in mmap_read_lock/unlock, the acquire semantics of static_branch_inc(), and the barriers in __flush_tlb_kernel_pgtable() ensures that ptdump will never get a hold on the address of a freed PMD or PTE table. We can verify the correctness of the algorithm via the following litmus test (thanks to James Houghton and Will Deacon): AArch64 ptdump Variant=Ifetch { uint64_t pud=0xa110c; uint64_t pmd; 0:X0=label:"P1:L0"; 0:X1=instr:"NOP"; 0:X2=lock; 0:X3=pud; 0:X4=pmd; 1:X1=0xdead; 1:X2=lock; 1:X3=pud; 1:X4=pmd; } P0 | P1 ; (* static_key_enable *) | (* pud_free_pmd_page *) ; STR W1, [X0] | LDR X9, [X3] ; DC CVAU,X0 | STR XZR, [X3] ; DSB ISH | DSB ISH ; IC IVAU,X0 | ISB ; DSB ISH | ; ISB | (* static key *) ; | L0: ; (* mmap_lock *) | B out1 ; Lwlock: | ; MOV W7, #1 | (* mmap_lock *) ; SWPA W7, W8, [X2] | Lrlock: ; | MOV W7, #1 ; | SWPA W7, W8, [X2] ; (* walk pgtable *) | ; LDR X9, [X3] | (* mmap_unlock *) ; CBZ X9, out0 | STLR WZR, [X2] ; EOR X10, X9, X9 | ; LDR X11, [X4, X10] | out1: ; | EOR X10, X9, X9 ; out0: | STR X1, [X4, X10] ; exists (0:X8=0 /\ 1:X8=0 /\ (* Lock acquisitions succeed *) 0:X9=0xa110c /\ (* P0 sees the valid PUD ...*) 0:X11=0xdead) (* ... but the freed PMD *) For an approximate written proof of why this algorithm works, please read the code comment in [1], which is now removed for the sake of simplicity. mm-selftests pass. No issues were observed while parallelly running test_vmalloc.sh (which stresses the vmalloc subsystem), and cat /sys/kernel/debug/{kernel_page_tables, check_wx_pages} in a loop. Link: https://lore.kernel.org/all/20250723161827.15802-1-dev.jain@arm.com/ [1] Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Diffstat (limited to 'lib/bitmap-str.c')
0 files changed, 0 insertions, 0 deletions