summaryrefslogtreecommitdiff
path: root/arch/riscv/Kconfig
AgeCommit message (Collapse)Author
2023-12-12kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMPIgnat Korchagin
In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP required CONFIG_KEXEC. However, it was not enough at least for arm64 platforms. While further testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of which are set in our config. Given that we already established that CONFIG_KEXEC (which is a switch for kexec system call itself) is not required for CONFIG_CRASH_DUMP drop CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") and this patch are applied given that the necessary shared bits are included via CONFIG_KEXEC_CORE dependency. [bhe@redhat.com: don't export some symbols when CONFIG_MMU=n] Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv [bhe@redhat.com: riscv, kexec: fix dependency of two items] Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP Cc: <stable@vger.kernel.org> # 6.6+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-11-10Merge tag 'riscv-for-linus-6.7-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for handling misaligned accesses in S-mode - Probing for misaligned access support is now properly cached and handled in parallel - PTDUMP now reflects the SW reserved bits, as well as the PBMT and NAPOT extensions - Performance improvements for TLB flushing - Support for many new relocations in the module loader - Various bug fixes and cleanups * tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) riscv: Optimize bitops with Zbb extension riscv: Rearrange hwcap.h and cpufeature.h drivers: perf: Do not broadcast to other cpus when starting a counter drivers: perf: Check find_first_bit() return value of: property: Add fw_devlink support for msi-parent RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings riscv: Don't use PGD entries for the linear mapping RISC-V: Probe misaligned access speed in parallel RISC-V: Remove __init on unaligned_emulation_finish() RISC-V: Show accurate per-hart isa in /proc/cpuinfo RISC-V: Don't rely on positional structure initialization riscv: Add tests for riscv module loading riscv: Add remaining module relocations riscv: Avoid unaligned access when relocating modules riscv: split cache ops out of dma-noncoherent.c riscv: Improve flush_tlb_kernel_range() riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb riscv: Improve flush_tlb_range() for hugetlb pages riscv: Improve tlb_flush() ...
2023-11-08Merge tag 'riscv-for-linus-6.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for cbo.zero in userspace - Support for CBOs on ACPI-based systems - A handful of improvements for the T-Head cache flushing ops - Support for software shadow call stacks - Various cleanups and fixes * tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits) RISC-V: hwprobe: Fix vDSO SIGSEGV riscv: configs: defconfig: Enable configs required for RZ/Five SoC riscv: errata: prefix T-Head mnemonics with th. riscv: put interrupt entries into .irqentry.text riscv: mm: Update the comment of CONFIG_PAGE_OFFSET riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause riscv/mm: Fix the comment for swap pte format RISC-V: clarify the QEMU workaround in ISA parser riscv: correct pt_level name via pgtable_l5/4_enabled RISC-V: Provide pgtable_l5_enabled on rv32 clocksource: timer-riscv: Increase rating of clock_event_device for Sstc clocksource: timer-riscv: Don't enable/disable timer interrupt lkdtm: Fix CFI_BACKWARD on RISC-V riscv: Use separate IRQ shadow call stacks riscv: Implement Shadow Call Stack riscv: Move global pointer loading to a macro riscv: Deduplicate IRQ stack switching riscv: VMAP_STACK overflow detection thread-safe RISC-V: cacheflush: Initialize CBO variables on ACPI systems RISC-V: ACPI: RHCT: Add function to get CBO block sizes ...
2023-11-06riscv: select ARCH_PROC_KCORE_TEXTAndreas Schwab
This adds a separate segment for kernel text in /proc/kcore, which has a different address than the direct linear map. Signed-off-by: Andreas Schwab <schwab@suse.de> Link: https://lore.kernel.org/r/mvmh6m758ao.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05Merge patch series "Add support to handle misaligned accesses in S-mode"Palmer Dabbelt
Clément Léger <cleger@rivosinc.com> says: Since commit 61cadb9 ("Provide new description of misaligned load/store behavior compatible with privileged architecture.") in the RISC-V ISA manual, it is stated that misaligned load/store might not be supported. However, the RISC-V kernel uABI describes that misaligned accesses are supported. In order to support that, this series adds support for S-mode handling of misaligned accesses as well support for prctl(PR_UNALIGN). Handling misaligned access in kernel allows for a finer grain control of the misaligned accesses behavior, and thanks to the prctl() call, can allow disabling misaligned access emulation to generate SIGBUS. User space can then optimize its software by removing such access based on SIGBUS generation. This series is useful when using a SBI implementation that does not handle misaligned traps as well as detecting misaligned accesses generated by userspace application using the prctrl(PR_SET_UNALIGN) feature. This series can be tested using the spike simulator[1] and a modified openSBI version[2] which allows to always delegate misaligned load/store to S-mode. A test[3] that exercise various instructions/registers can be executed to verify the unaligned access support. [1] https://github.com/riscv-software-src/riscv-isa-sim [2] https://github.com/rivosinc/opensbi/tree/dev/cleger/no_misaligned [3] https://github.com/clementleger/unaligned_test * b4-shazam-merge: riscv: add support for PR_SET_UNALIGN and PR_GET_UNALIGN riscv: report misaligned accesses emulation to hwprobe riscv: annotate check_unaligned_access_boot_cpu() with __init riscv: add support for sysctl unaligned_enabled control riscv: add floating point insn support to misaligned access emulation riscv: report perf event for misaligned fault riscv: add support for misaligned trap handling in S-mode riscv: remove unused functions in traps_misaligned.c Link: https://lore.kernel.org/r/20231004151405.521596-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-02Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
2023-11-02Merge patch series "riscv: SCS support"Palmer Dabbelt
Sami Tolvanen <samitolvanen@google.com> says: This series adds Shadow Call Stack (SCS) support for RISC-V. SCS uses compiler instrumentation to store return addresses in a separate shadow stack to protect them against accidental or malicious overwrites. More information about SCS can be found here: https://clang.llvm.org/docs/ShadowCallStack.html Patch 1 is from Deepak, and it simplifies VMAP_STACK overflow handling by adding support for accessing per-CPU variables directly in assembly. The patch is included in this series to make IRQ stack switching cleaner with SCS, and I've simply rebased it and fixed a couple of minor issues. Patch 2 uses this functionality to clean up the stack switching by moving duplicate code into a single function. On RISC-V, the compiler uses the gp register for storing the current shadow call stack pointer, which is incompatible with global pointer relaxation. Patch 3 moves global pointer loading into a macro that can be easily disabled with SCS. Patch 4 implements SCS register loading and switching, and allows the feature to be enabled, and patch 5 adds separate per-CPU IRQ shadow call stacks when CONFIG_IRQ_STACKS is enabled. Patch 6 fixes the backward-edge CFI test in lkdtm for RISC-V. Note that this series requires Clang 17. Earlier Clang versions support SCS on RISC-V, but use the x18 register instead of gp, which isn't ideal. gcc has SCS support for arm64, but I'm not aware of plans to support RISC-V. Once the Zicfiss extension is ratified, it's probably preferable to use hardware-backed shadow stacks instead of SCS on hardware that supports the extension, and we may want to consider implementing CONFIG_DYNAMIC_SCS to patch between the implementation at runtime (similarly to the arm64 implementation, which switches to SCS when hardware PAC support isn't available). * b4-shazam-merge: lkdtm: Fix CFI_BACKWARD on RISC-V riscv: Use separate IRQ shadow call stacks riscv: Implement Shadow Call Stack riscv: Move global pointer loading to a macro riscv: Deduplicate IRQ stack switching riscv: VMAP_STACK overflow detection thread-safe Link: https://lore.kernel.org/r/20230927224757.1154247-8-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01riscv: add support for sysctl unaligned_enabled controlClément Léger
This sysctl tuning option allows the user to disable misaligned access handling globally on the system. This will also be used by misaligned detection code to temporarily disable misaligned access handling. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20231004151405.521596-6-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01riscv: add support for misaligned trap handling in S-modeClément Léger
Misalignment trap handling is only supported for M-mode and uses direct accesses to user memory. In S-mode, when handling usermode fault, this requires to use the get_user()/put_user() accessors. Implement load_u8(), store_u8() and get_insn() using these accessors for userspace and direct text access for kernel. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20231004151405.521596-3-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Implement Shadow Call StackSami Tolvanen
Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the compiler injects instructions to all non-leaf C functions to store the return address to the shadow stack and unconditionally load it again before returning, which makes it harder to corrupt the return address through a stack overflow, for example. The active shadow call stack pointer is stored in the gp register, which makes SCS incompatible with gp relaxation. Use --no-relax-gp to ensure gp relaxation is disabled and disable global pointer loading. Add SCS pointers to struct thread_info, implement SCS initialization, and task switching Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-26RISC-V: ACPI: Enhance acpi_os_ioremap with MMIO remappingSunil V L
Enhance the acpi_os_ioremap() to support opregions in MMIO space. Also, have strict checks using EFI memory map to allow remapping the RAM similar to arm64. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231018124007.1306159-2-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-26riscv: only select DMA_DIRECT_REMAP from RISCV_ISA_ZICBOM and ERRATA_THEAD_PBMTChristoph Hellwig
RISCV_DMA_NONCOHERENT is also used for whacky non-standard non-coherent ops that use different hooks in dma-direct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20231018052654.50074-3-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2023-10-26riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENTChristoph Hellwig
RISCV_NONSTANDARD_CACHE_OPS is also used for the pmem cache maintenance helpers, which are built into the kernel unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231018052654.50074-2-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2023-10-04riscv: kdump: use generic interface to simplify crashkernel reservationBaoquan He
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(); 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/riscv/Kconfig. The old reserve_crashkernel_low() and reserve_crashkernel() can be removed. [chenjiahao16@huawei.com: fix crashkernel reserving problem on RISC-V] Link: https://lkml.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Link: https://lkml.kernel.org/r/20230914033142.676708-9-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-08riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabledLad Prabhakar
kernel/dma/mapping.c has its use of pgprot_dmacoherent() inside an #ifdef CONFIG_MMU block. kernel/dma/pool.c has its use of pgprot_dmacoherent() inside an #ifdef CONFIG_DMA_DIRECT_REMAP block. So select DMA_DIRECT_REMAP only if MMU is enabled for RISCV_DMA_NONCOHERENT config. This avoids users to explicitly select MMU. Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20230901105111.311200-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "riscv: Introduce KASLR"Palmer Dabbelt
Alexandre Ghiti <alexghiti@rivosinc.com> says: The following KASLR implementation allows to randomize the kernel mapping: - virtually: we expect the bootloader to provide a seed in the device-tree - physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage. The new virtual kernel location is limited by the early page table that only has one PUD and with the PMD alignment constraint, the kernel can only take < 512 positions. * b4-shazam-merge: riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "Add non-coherent DMA support for AX45MP"Palmer Dabbelt
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> non-coherent DMA support for AX45MP ==================================== On the Andes AX45MP core, cache coherency is a specification option so it may not be supported. In this case DMA will fail. To get around with this issue this patch series does the below: 1] Andes alternative ports is implemented as errata which checks if the IOCP is missing and only then applies to CMO errata. One vendor specific SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. Below are the configs which Andes port provides (and are selected by RZ/Five): - ERRATA_ANDES - ERRATA_ANDES_CMO OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now part v1.3 release. 2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA) block that allows dynamic adjustment of memory attributes in the runtime. It contains a configurable amount of PMA entries implemented as CSR registers to control the attributes of memory locations in interest. OpenSBI configures the PMA regions as required and creates a reserve memory node and propagates it to the higher boot stack. Currently OpenSBI (upstream) configures the required PMA region and passes this a shared DMA pool to Linux. reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; pma_resv0@58000000 { compatible = "shared-dma-pool"; reg = <0x0 0x58000000 0x0 0x08000000>; no-map; linux,dma-default; }; }; The above shared DMA pool gets appended to Linux DTB so the DMA memory requests go through this region. 3] We provide callbacks to synchronize specific content between memory and cache. 4] RZ/Five SoC selects the below configs - AX45MP_L2_CACHE - DMA_GLOBAL_POOL - ERRATA_ANDES - ERRATA_ANDES_CMO ----------x---------------------x--------------------x---------------x---- * b4-shazam-merge: soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list Link: https://lore.kernel.org/r/20230818135723.80612-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-05riscv: Introduce virtual kernel mapping KASLRAlexandre Ghiti
KASLR implementation relies on a relocatable kernel so that we can move the kernel mapping. The seed needed to virtually move the kernel is taken from the device tree, so we rely on the bootloader to provide a correct seed. Zkr could be used unconditionnally instead if implemented, but that's for another patch. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01riscv: mm: dma-noncoherent: nonstandard cache operations supportLad Prabhakar
Introduce support for nonstandard noncoherent systems in the RISC-V architecture. It enables function pointer support to handle cache management in such systems. This patch adds a new configuration option called "RISCV_NONSTANDARD_CACHE_OPS." This option is a boolean flag that depends on "RISCV_DMA_NONCOHERENT" and enables the function pointer support for cache management in nonstandard noncoherent systems. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> # tyre-kicking on a d1 Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> # Link: https://lore.kernel.org/r/20230818135723.80612-4-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01Merge tag 'riscv-for-linus-6.6-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the new "riscv,isa-extensions" and "riscv,isa-base" device tree interfaces for probing extensions - Support for userspace access to the performance counters - Support for more instructions in kprobes - Crash kernels can be allocated above 4GiB - Support for KCFI - Support for ELFs in !MMU configurations - ARCH_KMALLOC_MINALIGN has been reduced to 8 - mmap() defaults to sv48-sized addresses, with longer addresses hidden behind a hint (similar to Arm and Intel) - Also various fixes and cleanups * tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V riscv: support PREEMPT_DYNAMIC with static keys riscv: Move create_tmp_mapping() to init sections riscv: Mark KASAN tmp* page tables variables as static riscv: mm: use bitmap_zero() API riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B riscv: remove redundant mv instructions RISC-V: mm: Document mmap changes RISC-V: mm: Update pgtable comment documentation RISC-V: mm: Add tests for RISC-V mm RISC-V: mm: Restrict address space for sv39,sv48,sv57 riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value riscv: support the elf-fdpic binfmt loader binfmt_elf_fdpic: support 64-bit systems riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions ...
2023-08-31Merge patch series "riscv: Reduce ARCH_KMALLOC_MINALIGN to 8"Palmer Dabbelt
Jisheng Zhang <jszhang@kernel.org> says: Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to coherent platforms: Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don't exist any more, they are replaced with either kmalloc-128 or kmalloc-64. Secondly, larger than necessary kmalloc aligned allocations results in unnecessary cache/TLB pressure. This issue also exists on arm64 platforms. From last year, Catalin tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage in various drivers with ARCH_DMA_MINALIGN etc.[1] One fact we can make use of for riscv: if the CPU doesn't support ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on Catalin's work and above fact, we can easily solve the kmalloc align issue for riscv: we can override dma_get_cache_alignment(), then let it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know the underlying HW neither supports ZICBOM nor supports T-HEAD CMO. So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the kmalloc minimum alignment, nothing changed in this case. This case can be improved in the future once we see such platforms in mainline. After this patch, a simple test of booting to a small buildroot rootfs on qemu shows: kmalloc-96 5041 5041 96 ... kmalloc-64 9606 9606 64 ... kmalloc-32 5128 5128 32 ... kmalloc-16 7682 7682 16 ... kmalloc-8 10246 10246 8 ... So we save about 1268KB memory. The saving will be much larger in normal OS env on real HW platforms. patch1 allows kmalloc() caches aligned to the smallest value. patch2 enables DMA_BOUNCE_UNALIGNED_KMALLOC. After this series: As for coherent platforms, kmalloc-{8,16,32,96} caches come back on coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO. As for noncoherent RV32 platforms, nothing changed. As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the above kmalloc caches also come back if > 4GB memory or users pass "swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How much mmnn should be depends on the specific platform, it needs to be tried and tested all possible usage case on the specific hardware. For example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock. * b4-shazam-merge: riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1] Link: https://lore.kernel.org/r/20230718152214.2907-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31riscv: support PREEMPT_DYNAMIC with static keysJisheng Zhang
Currently, each architecture can support PREEMPT_DYNAMIC through either static calls or static keys. To support PREEMPT_DYNAMIC on riscv, we face three choices: 1. only add static calls support to riscv As Mark pointed out in commit 99cf983cc8bc ("sched/preempt: Add PREEMPT_DYNAMIC using static keys"), static keys "...should have slightly lower overhead than non-inline static calls, as this effectively inlines each trampoline into the start of its callee. This may avoid redundant work, and may integrate better with CFI schemes." So even we add static calls(without inline static calls) to riscv, static keys is still a better choice. 2. add static calls and inline static calls to riscv Per my understanding, inline static calls requires objtool support which is not easy. 3. use static keys While riscv doesn't have static calls support, it supports static keys perfectly. So this patch selects HAVE_PREEMPT_DYNAMIC_KEY to enable support for PREEMPT_DYNAMIC on riscv, so that the preemption model can be chosen at boot time. It also patches asm-generic/preempt.h, mainly to add __preempt_schedule() and __preempt_schedule_notrace() macros for PREEMPT_DYNAMIC case. Other architectures which use generic preempt.h can also benefit from this patch by simply selecting HAVE_PREEMPT_DYNAMIC_KEY to enable PREEMPT_DYNAMIC if they supports static keys. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230716164925.1858-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31Merge patch series "riscv: KCFI support"Palmer Dabbelt
Sami Tolvanen <samitolvanen@google.com> says: This series adds KCFI support for RISC-V. KCFI is a fine-grained forward-edge control-flow integrity scheme supported in Clang >=16, which ensures indirect calls in instrumented code can only branch to functions whose type matches the function pointer type, thus making code reuse attacks more difficult. Patch 1 implements a pt_regs based syscall wrapper to address function pointer type mismatches in syscall handling. Patches 2 and 3 annotate indirectly called assembly functions with CFI types. Patch 4 implements error handling for indirect call checks. Patch 5 disables CFI for arch/riscv/purgatory. Patch 6 finally allows CONFIG_CFI_CLANG to be enabled for RISC-V. Note that Clang 16 has a generic architecture-agnostic KCFI implementation, which does work with the kernel, but doesn't produce a stable code sequence for indirect call checks, which means potential failures just trap and won't result in informative error messages. Clang 17 includes a RISC-V specific back-end implementation for KCFI, which emits a predictable code sequence for the checks and a .kcfi_traps section with locations of the traps, which patch 5 uses to produce more useful errors. The type mismatch fixes and annotations in the first three patches also become necessary in future if the kernel decides to support fine-grained CFI implemented using the hardware landing pad feature proposed in the in-progress Zicfisslp extension. Once the specification is ratified and hardware support emerges, implementing runtime patching support that replaces KCFI instrumentation with Zicfisslp landing pads might also be feasible (similarly to KCFI to FineIBT patching on x86_64), allowing distributions to ship a unified kernel binary for all devices. * b4-shazam-merge: riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions riscv: Implement syscall wrappers Link: https://lore.kernel.org/r/20230710183544.999540-8-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-29Merge tag 'mm-stable-2023-08-28-18-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2023-08-24riscv: Fix build errors using binutils2.37 toolchainsMingzheng Xing
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0, the following error occurs: Assembler messages: Error: cannot find default versions of the ISA extension `zicsr' Error: cannot find default versions of the ISA extension `zifencei' The above error originated from this commit of binutils[0], which has been resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2]. So fix this by change the GCC version in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1995608fbf6648fcee4e9e0c [0] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ca2bbb88f999f4d3cc40e89bc1aba712505dd598 [1] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e35ae9fd6671 [2] Fixes: ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils") Reported-by: Conor Dooley <conor.dooley@microchip.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20230824190852.45470-1-xingmingzheng@iscas.ac.cn Closes: https://lore.kernel.org/all/20230823-captive-abdomen-befd942a4a73@wendy/ Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherentJisheng Zhang
With the DMA bouncing of unaligned kmalloc() buffers now in place, enable it for riscv when RISCV_DMA_NONCOHERENT=y to allow the kmalloc-{8,16,32,96} caches. Since RV32 doesn't enable SWIOTLB yet, and I didn't see any dma noncoherent RV32 platforms in the mainline, so skip RV32 now by only enabling DMA_BOUNCE_UNALIGNED_KMALLOC if SWIOTLB is available. Once we see such requirement on RV32, we can enable it then. NOTE: we didn't force to create the swiotlb buffer even when the end of RAM is within the 32-bit physical address range. That's to say: For RV64 with > 4GB memory, the feature is enabled. For RV64 with <= 4GB memory, the feature isn't enabled by default. We rely on users to pass "swiotlb=mmnn,force" where mmnn is the Number of I/O TLB slabs, see kernel-parameters.txt for details. Tested on Sipeed Lichee Pi 4A with 8GB DDR and Sipeed M1S BL808 Dock board. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230718152214.2907-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23riscv: Allow CONFIG_CFI_CLANG to be selectedSami Tolvanen
Select ARCH_SUPPORTS_CFI_CLANG to allow CFI_CLANG to be selected on riscv. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-14-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23riscv: Add CFI error handlingSami Tolvanen
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately before each function and a check to validate the target function type before indirect calls: ; type preamble .word <id> function: ... ; indirect call check lw t1, -4(a0) lui t2, <hi20> addiw t2, t2, <lo12> beq t1, t2, .Ltmp0 ebreak .Ltmp0: jarl a0 Implement error handling code for the ebreak traps emitted for the checks. This produces the following oops on a CFI failure (generated using lkdtm): [ 21.177245] CFI failure at lkdtm_indirect_call+0x22/0x32 [lkdtm] (target: lkdtm_increment_int+0x0/0x18 [lkdtm]; expected type: 0x3ad55aca) [ 21.178483] Kernel BUG [#1] [ 21.178671] Modules linked in: lkdtm [ 21.179037] CPU: 1 PID: 104 Comm: sh Not tainted 6.3.0-rc6-00037-g37d5ec6297ab #1 [ 21.179511] Hardware name: riscv-virtio,qemu (DT) [ 21.179818] epc : lkdtm_indirect_call+0x22/0x32 [lkdtm] [ 21.180106] ra : lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm] [ 21.180426] epc : ffffffff01387092 ra : ffffffff01386f14 sp : ff20000000453cf0 [ 21.180792] gp : ffffffff81308c38 tp : ff6000000243f080 t0 : ff20000000453b78 [ 21.181157] t1 : 000000003ad55aca t2 : 000000007e0c52a5 s0 : ff20000000453d00 [ 21.181506] s1 : 0000000000000001 a0 : ffffffff0138d170 a1 : ffffffff013870bc [ 21.181819] a2 : b5fea48dd89aa700 a3 : 0000000000000001 a4 : 0000000000000fff [ 21.182169] a5 : 0000000000000004 a6 : 00000000000000b7 a7 : 0000000000000000 [ 21.182591] s2 : ff20000000453e78 s3 : ffffffffffffffea s4 : 0000000000000012 [ 21.183001] s5 : ff600000023c7000 s6 : 0000000000000006 s7 : ffffffff013882a0 [ 21.183653] s8 : 0000000000000008 s9 : 0000000000000002 s10: ffffffff0138d878 [ 21.184245] s11: ffffffff0138d878 t3 : 0000000000000003 t4 : 0000000000000000 [ 21.184591] t5 : ffffffff8133df08 t6 : ffffffff8133df07 [ 21.184858] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 21.185415] [<ffffffff01387092>] lkdtm_indirect_call+0x22/0x32 [lkdtm] [ 21.185772] [<ffffffff01386f14>] lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm] [ 21.186093] [<ffffffff01383552>] lkdtm_do_action+0x22/0x34 [lkdtm] [ 21.186445] [<ffffffff0138350c>] direct_entry+0x128/0x13a [lkdtm] [ 21.186817] [<ffffffff8033ed8c>] full_proxy_write+0x58/0xb2 [ 21.187352] [<ffffffff801d4fe8>] vfs_write+0x14c/0x33a [ 21.187644] [<ffffffff801d5328>] ksys_write+0x64/0xd4 [ 21.187832] [<ffffffff801d53a6>] sys_write+0xe/0x1a [ 21.188171] [<ffffffff80003996>] ret_from_syscall+0x0/0x2 [ 21.188595] Code: 0513 0f65 a303 ffc5 53b7 7e0c 839b 2a53 0363 0073 (9002) 9582 [ 21.189178] ---[ end trace 0000000000000000 ]--- [ 21.189590] Kernel panic - not syncing: Fatal exception Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> # ISA bits Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23riscv: Implement syscall wrappersSami Tolvanen
Commit f0bddf50586d ("riscv: entry: Convert to generic entry") moved syscall handling to C code, which exposed function pointer type mismatches that trip fine-grained forward-edge Control-Flow Integrity (CFI) checks as syscall handlers are all called through the same syscall_t pointer type. To fix the type mismatches, implement pt_regs based syscall wrappers similarly to x86 and arm64. This patch is based on arm64 syscall wrappers added in commit 4378a7d4be30 ("arm64: implement syscall wrappers"), where the main goal was to minimize the risk of userspace-controlled values being used under speculation. This may be a concern for riscv in future as well. Following other architectures, the syscall wrappers generate three functions for each syscall; __riscv_<compat_>sys_<name> takes a pt_regs pointer and extracts arguments from registers, __se_<compat_>sys_<name> is a sign-extension wrapper that casts the long arguments to the correct types for the real syscall implementation, which is named __do_<compat_>sys_<name>. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23riscv: Require FRAME_POINTER for some configurationsBjörn Töpel
Some V configurations implicitly turn on '-fno-omit-frame-pointer', but leaving FRAME_POINTER disabled. This makes it hard to reason about the FRAME_POINTER config, and also triggers build failures introduced in by the commit in the Fixes: tag. Select FRAME_POINTER explicitly for these configurations. Fixes: ebc9cb03b21e ("riscv: stack: Fixup independent softirq stack for CONFIG_FRAME_POINTER=n") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230823082845.354839-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-18kexec: rename ARCH_HAS_KEXEC_PURGATORYEric DeVolder
The Kconfig refactor to consolidate KEXEC and CRASH options utilized option names of the form ARCH_SUPPORTS_<option>. Thus rename the ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow the same. Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18riscv/kexec: refactor for kernel/Kconfig.kexecEric DeVolder
The kexec and crash kernel options are provided in the common kernel/Kconfig.kexec. Utilize the common options and provide the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the equivalent set of KEXEC and CRASH options. Link: https://lkml.kernel.org/r/20230712161545.87870-12-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm/vmemmap optimization: split hugetlb and devdax vmemmap optimizationAneesh Kumar K.V
Arm disabled hugetlb vmemmap optimization [1] because hugetlb vmemmap optimization includes an update of both the permissions (writeable to read-only) and the output address (pfn) of the vmemmap ptes. That is not supported without unmapping of pte(marking it invalid) by some architectures. With DAX vmemmap optimization we don't require such pte updates and architectures can enable DAX vmemmap optimization while having hugetlb vmemmap optimization disabled. Hence split DAX optimization support into a different config. s390, loongarch and riscv don't have devdax support. So the DAX config is not enabled for them. With this change, arm64 should be able to select DAX optimization [1] commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") Link: https://lkml.kernel.org/r/20230724190759.483013-8-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-16riscv: Handle zicsr/zifencei issue between gcc and binutilsMingzheng Xing
Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer 20191213 version which moves some instructions from the I extension to the Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds that version, we should explicitly specifying Zicsr and Zifencei via -march to cope with the new changes. but this only occurs when binutils >= 2.36 and GCC >= 11.1.0. It's a different story when binutils < 2.36. binutils-2.36 supports the Zifencei extension[2] and splits Zifencei and Zicsr from I[3]. GCC-11.1.0 is particular[4] because it add support Zicsr and Zifencei extension for -march. binutils-2.35 does not support the Zifencei extension, and does not need to specify Zicsr and Zifencei when working with GCC >= 12.1.0. To make our lives easier, let's relax the check to binutils >= 2.36 in CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. For the other two cases, where clang < 17 or GCC < 11.1.0, we will deal with them in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC. For more information, please refer to: commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils") Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc [0] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51b5b52fec2cd [1] Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=5a1b31e1e1cee6e9f1c92abff59cdcfff0dddf30 [2] Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=729a53530e86972d1143553a415db34e6e01d5d2 [3] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a78fa3fb4ef49 [4] Link: https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org Link: https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20230809165648.21071-1-xingmingzheng@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-25RISC-V: provide Kconfig & commandline options to control parsing "riscv,isa"Conor Dooley
As it says on the tin, provide Kconfig option to control parsing the "riscv,isa" devicetree property. If either option is used, the kernel will fall back to parsing "riscv,isa", where "riscv,isa-base" and "riscv,isa-extensions" are not present. The Kconfig options are set up so that the default kernel configuration will enable the fallback path, without needing the commandline option. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-aviator-plausibly-a35662485c2c@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-07Merge tag 'riscv-for-linus-6.5-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - A bunch of fixes/cleanups from the first part of the merge window, mostly related to ACPI and vector as those were large - Some documentation improvements, mostly related to the new code - The "riscv,isa" DT key is deprecated - Support for link-time dead code elimination - Support for minor fault registration in userfaultd - A handful of cleanups around CMO alternatives * tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (23 commits) riscv: mm: mark noncoherent_supported as __ro_after_init riscv: mm: mark CBO relate initialization funcs as __init riscv: errata: thead: only set cbom size & noncoherent during boot riscv: Select HAVE_ARCH_USERFAULTFD_MINOR RISC-V: Document the ISA string parsing rules for ACPI risc-v: Fix order of IPI enablement vs RCU startup mm: riscv: fix an unsafe pte read in huge_pte_alloc() dt-bindings: riscv: deprecate riscv,isa RISC-V: drop error print from riscv_hartid_to_cpuid() riscv: Discard vector state on syscalls riscv: move memblock_allow_resize() after linear mapping is ready riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle riscv: vdso: include vdso/vsyscall.h for vdso_data selftests: Test RISC-V Vector's first-use handler riscv: vector: clear V-reg in the first-use trap riscv: vector: only enable interrupts in the first-use trap RISC-V: Fix up some vector state related build failures RISC-V: Document that V registers are clobbered on syscalls riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION ...
2023-07-06riscv: Select HAVE_ARCH_USERFAULTFD_MINORSamuel Holland
This allocates the VM flag needed to support the userfaultfd minor fault functionality. Because the flag bit is >= bit 32, it can only be enabled for 64-bit kernels. See commit 7677f7fd8be7 ("userfaultfd: add minor fault registration mode") for more information. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20230624060321.3401504-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-04riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idleSong Shuai
With this configuration opened, the basic platform-independent s2idle is provided by the sole "s2idle" string in `/sys/power/mem_sleep`. At the end of s2idle, harts will hit the `wfi` instruction or enter the SUSPENDED state through the sbi_cpuidle driver. The interrupt of possible wakeup devices will be kept to wake the system up. And platform-specific sleep states can be provided by future ACPI and SBI SUSP extension support. Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230529101524.322076-1-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-01Merge patch series "riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION"Palmer Dabbelt
Jisheng Zhang <jszhang@kernel.org> says: When trying to run linux with various opensource riscv core on resource limited FPGA platforms, for example, those FPGAs with less than 16MB SDRAM, I want to save mem as much as possible. One of the major technologies is kernel size optimizations, I found that riscv does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which passes -fdata-sections, -ffunction-sections to CFLAGS and passes the --gc-sections flag to the linker. This not only benefits my case on FPGA but also benefits defconfigs. Here are some notable improvements from enabling this with defconfigs: nommu_k210_defconfig: text data bss dec hex 1112009 410288 59837 1582134 182436 before 962838 376656 51285 1390779 1538bb after rv32_defconfig: text data bss dec hex 8804455 2816544 290577 11911576 b5c198 before 8692295 2779872 288977 11761144 b375f8 after defconfig: text data bss dec hex 9438267 3391332 485333 13314932 cb2b74 before 9285914 3350052 483349 13119315 c82f53 after patch1 and patch2 are clean ups. patch3 fixes a typo. patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. * b4-shazam-merge: riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION vmlinux.lds.h: use correct .init.data.* section name riscv: vmlinux-xip.lds.S: remove .alternative section riscv: move options to keep entries sorted riscv: Fix orphan section warnings caused by kernel/pi Link: https://lore.kernel.org/r/20230523165502.2592-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-30Merge tag 'trace-v6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Add new feature to have function graph tracer record the return value. Adds a new option: funcgraph-retval ; when set, will show the return value of a function in the function graph tracer. - Also add the option: funcgraph-retval-hex where if it is not set, and the return value is an error code, then it will return the decimal of the error code, otherwise it still reports the hex value. - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd That when a application opens it, it becomes the task that the timer lat tracer traces. The application can also read this file to find out how it's being interrupted. - Add the file /sys/kernel/tracing/available_filter_functions_addrs that works just the same as available_filter_functions but also shows the addresses of the functions like kallsyms, except that it gives the address of where the fentry/mcount jump/nop is. This is used by BPF to make it easier to attach BPF programs to ftrace hooks. - Replace strlcpy with strscpy in the tracing boot code. * tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix warnings when building htmldocs for function graph retval riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL tracing/boot: Replace strlcpy with strscpy tracing/timerlat: Add user-space interface tracing/osnoise: Skip running osnoise if all instances are off tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable ftrace: Show all functions with addresses in available_filter_functions_addrs selftests/ftrace: Add funcgraph-retval test case LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex function_graph: Support recording and printing the return value of function fgraph: Add declaration of "struct fgraph_ret_regs"
2023-06-30Merge tag 'riscv-for-linus-6.5-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for ACPI - Various cleanups to the ISA string parsing, including making them case-insensitive - Support for the vector extension - Support for independent irq/softirq stacks - Our CPU DT binding now has "unevaluatedProperties: false" * tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits) riscv: hibernate: remove WARN_ON in save_processor_state dt-bindings: riscv: cpus: switch to unevaluatedProperties: false dt-bindings: riscv: cpus: add a ref the common cpu schema riscv: stack: Add config of thread stack size riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK RISC-V: always report presence of extensions formerly part of the base ISA dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support RISC-V: remove decrement/increment dance in ISA string parser RISC-V: rework comments in ISA string parser RISC-V: validate riscv,isa at boot, not during ISA string parsing RISC-V: split early & late of_node to hartid mapping RISC-V: simplify register width check in ISA string parsing perf: RISC-V: Limit the number of counters returned from SBI riscv: replace deprecated scall with ecall riscv: uprobes: Restore thread.bad_cause riscv: mm: try VMA lock-based page fault handling first riscv: mm: Pre-allocate PGD entries for vmalloc/modules area RISC-V: hwprobe: Expose Zba, Zbb, and Zbs RISC-V: Track ISA extensions per hart ...
2023-06-28Merge branch 'expand-stack'Linus Torvalds
This modifies our user mode stack expansion code to always take the mmap_lock for writing before modifying the VM layout. It's actually something we always technically should have done, but because we didn't strictly need it, we were being lazy ("opportunistic" sounds so much better, doesn't it?) about things, and had this hack in place where we would extend the stack vma in-place without doing the proper locking. And it worked fine. We just needed to change vm_start (or, in the case of grow-up stacks, vm_end) and together with some special ad-hoc locking using the anon_vma lock and the mm->page_table_lock, it all was fairly straightforward. That is, it was all fine until Ruihan Li pointed out that now that the vma layout uses the maple tree code, we *really* don't just change vm_start and vm_end any more, and the locking really is broken. Oops. It's not actually all _that_ horrible to fix this once and for all, and do proper locking, but it's a bit painful. We have basically three different cases of stack expansion, and they all work just a bit differently: - the common and obvious case is the page fault handling. It's actually fairly simple and straightforward, except for the fact that we have something like 24 different versions of it, and you end up in a maze of twisty little passages, all alike. - the simplest case is the execve() code that creates a new stack. There are no real locking concerns because it's all in a private new VM that hasn't been exposed to anybody, but lockdep still can end up unhappy if you get it wrong. - and finally, we have GUP and page pinning, which shouldn't really be expanding the stack in the first place, but in addition to execve() we also use it for ptrace(). And debuggers do want to possibly access memory under the stack pointer and thus need to be able to expand the stack as a special case. None of these cases are exactly complicated, but the page fault case in particular is just repeated slightly differently many many times. And ia64 in particular has a fairly complicated situation where you can have both a regular grow-down stack _and_ a special grow-up stack for the register backing store. So to make this slightly more manageable, the bulk of this series is to first create a helper function for the most common page fault case, and convert all the straightforward architectures to it. Thus the new 'lock_mm_and_find_vma()' helper function, which ends up being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more than half the architectures, we now have more shared code and avoid some of those twisty little passages. And largely due to this common helper function, the full diffstat of this series ends up deleting more lines than it adds. That still leaves eight architectures (ia64, m68k, microblaze, openrisc, parisc, s390, sparc64 and um) that end up doing 'expand_stack()' manually because they are doing something slightly different from the normal pattern. Along with the couple of special cases in execve() and GUP. So there's a couple of patches that first create 'locked' helper versions of the stack expansion functions, so that there's a obvious path forward in the conversion. The execve() case is then actually pretty simple, and is a nice cleanup from our old "grow-up stackls are special, because at execve time even they grow down". The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because it's just more straightforward to write out the stack expansion there manually, instead od having get_user_pages_remote() do it for us in some situations but not others and have to worry about locking rules for GUP. And the final step is then to just convert the remaining odd cases to a new world order where 'expand_stack()' is called with the mmap_lock held for reading, but where it might drop it and upgrade it to a write, only to return with it held for reading (in the success case) or with it completely dropped (in the failure case). In the process, we remove all the stack expansion from GUP (where dropping the lock wouldn't be ok without special rules anyway), and add it in manually to __access_remote_vm() for ptrace(). Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases. Everything else here felt pretty straightforward, but the ia64 rules for stack expansion are really quite odd and very different from everything else. Also thanks to Vegard Nossum who caught me getting one of those odd conditions entirely the wrong way around. Anyway, I think I want to actually move all the stack expansion code to a whole new file of its own, rather than have it split up between mm/mmap.c and mm/memory.c, but since this will have to be backported to the initial maple tree vma introduction anyway, I tried to keep the patches _fairly_ minimal. Also, while I don't think it's valid to expand the stack from GUP, the final patch in here is a "warn if some crazy GUP user wants to try to expand the stack" patch. That one will be reverted before the final release, but it's left to catch any odd cases during the merge window and release candidates. Reported-by: Ruihan Li <lrh2000@pku.edu.cn> * branch 'expand-stack': gup: add warning if some caller would seem to want stack expansion mm: always expand the stack with the mmap write lock held execve: expand new process stack manually ahead of time mm: make find_extend_vma() fail if write lock not held powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma() mm/fault: convert remaining simple cases to lock_mm_and_find_vma() arm/mm: Convert to using lock_mm_and_find_vma() riscv/mm: Convert to using lock_mm_and_find_vma() mips/mm: Convert to using lock_mm_and_find_vma() powerpc/mm: Convert to using lock_mm_and_find_vma() arm64/mm: Convert to using lock_mm_and_find_vma() mm: make the page fault mmap locking killable mm: introduce new 'lock_mm_and_find_vma()' page fault helper
2023-06-26Merge tag 'smp-core-2023-06-26' of ↵Linus Torvalds
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...
2023-06-25riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLDNick Desaulniers
Linking allyesconfig with ld.lld-17 with CONFIG_DEAD_CODE_ELIMINATION=y takes hours. Assuming this is a performance regression that can be fixed, tentatively disable this for now so that allyesconfig builds don't start timing out. If and when there's a fix to ld.lld, this can be converted to a version check instead so that users of older but still supported versions of ld.lld don't hurt themselves by enabling CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. Link: https://github.com/ClangBuiltLinux/linux/issues/1881 Link: https://lore.kernel.org/linux-riscv/ZJXTwqZIkXLxXaSi@google.com/ Reported-by: Palmer Dabbelt <palmer@dabbelt.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-25riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATIONZhangjin Wu
Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing the user to enable dead code elimination. In order for this to work, ensure that we keep the alternative table by annotating them with KEEP. This boots well on qemu with both rv32_defconfig & rv64 defconfig, but it only shrinks their builds by ~1%, a smaller config is thereforce customized to test this feature: | rv32 | rv64 --------|------------------------|--------------------- No DCE | 4460684 | 4893488 DCE | 3986716 | 4376400 Shrink | 473968 (~10.6%) | 517088 (~10.5%) The config used above only reserves necessary options to boot on qemu with serial console, more like the size-critical embedded scenes: - rv64 config: https://pastebin.com/crz82T0s - rv32 config: rv64 config + 32-bit.config Here is Jisheng's original commit-msg: When trying to run linux with various opensource riscv core on resource limited FPGA platforms, for example, those FPGAs with less than 16MB SDRAM, I want to save mem as much as possible. One of the major technologies is kernel size optimizations, I found that riscv does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which passes -fdata-sections, -ffunction-sections to CFLAGS and passes the --gc-sections flag to the linker. This not only benefits my case on FPGA but also benefits defconfigs. Here are some notable improvements from enabling this with defconfigs: nommu_k210_defconfig: text data bss dec hex 1112009 410288 59837 1582134 182436 before 962838 376656 51285 1390779 1538bb after rv32_defconfig: text data bss dec hex 8804455 2816544 290577 11911576 b5c198 before 8692295 2779872 288977 11761144 b375f8 after defconfig: text data bss dec hex 9438267 3391332 485333 13314932 cb2b74 before 9285914 3350052 483349 13119315 c82f53 after Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Co-developed-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Tested-by: Bin Meng <bmeng@tinylab.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build Link: https://lore.kernel.org/r/20230523165502.2592-5-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-25riscv: move options to keep entries sortedJisheng Zhang
Recently, some commits break the entries order. Properly move their locations to keep entries sorted. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Guo Ren <guoren@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build Link: https://lore.kernel.org/r/20230523165502.2592-2-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-24riscv/mm: Convert to using lock_mm_and_find_vma()Ben Hutchings
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-23Merge patch series "riscv: Add independent irq/softirq stacks support"Palmer Dabbelt
guoren@kernel.org <guoren@kernel.org> says: From: Guo Ren <guoren@linux.alibaba.com> This patch series adds independent irq/softirq stacks to decrease the press of the thread stack. Also, add a thread STACK_SIZE config for users to adjust the proper size during compile time. * b4-shazam-merge: riscv: stack: Add config of thread stack size riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK Link: https://lore.kernel.org/r/20230614013018.2168426-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-22riscv: stack: Add config of thread stack sizeGuo Ren
The commit 0cac21b02ba5 ("riscv: use 16KB kernel stack on 64-bit") increases the thread size mandatory, but some scenarios, such as D1 with a small memory footprint, would suffer from that. After independent irq stack support, let's give users a choice to determine their custom stack size. Link: https://lore.kernel.org/linux-riscv/5f6e6c39-b846-4392-b468-02202404de28@www.fastmail.com/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230614013018.2168426-4-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>