Age | Commit message (Collapse) | Author |
|
Since commit 6b9f29b81b15 ("riscv: Enable pcpu page first chunk
allocator"), if NUMA is enabled, the page percpu allocator may be used
on very sparse configurations, or when requested on boot with
percpu_alloc=page.
In that case, percpu data gets put in the vmalloc area. However,
sbi_hsm_hart_start() needs the physical address of a sbi_hart_boot_data,
and simply assumes that __pa() would work. This causes the just started
hart to immediately access an invalid address and hang.
Fortunately, struct sbi_hart_boot_data is not too large, so we can
simply allocate an array for boot_data statically, putting it in the
kernel image.
This fixes NUMA=y SMP boot on Sophgo SG2042.
To reproduce on QEMU: Set CONFIG_NUMA=y and CONFIG_DEBUG_VIRTUAL=y, then
run with:
qemu-system-riscv64 -M virt -smp 2 -nographic \
-kernel arch/riscv/boot/Image \
-append "percpu_alloc=page"
Kernel output:
[ 0.000000] Booting Linux on hartid 0
[ 0.000000] Linux version 6.16.0-rc1 (dram@sakuya) (riscv64-unknown-linux-gnu-gcc (GCC) 14.2.1 20250322, GNU ld (GNU Binutils) 2.44) #11 SMP Tue Jun 24 14:56:22 CST 2025
...
[ 0.000000] percpu: 28 4K pages/cpu s85784 r8192 d20712
...
[ 0.083192] smp: Bringing up secondary CPUs ...
[ 0.086722] ------------[ cut here ]------------
[ 0.086849] virt_to_phys used for non-linear address: (____ptrval____) (0xff2000000001d080)
[ 0.088001] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xae/0xe8
[ 0.088376] Modules linked in:
[ 0.088656] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.16.0-rc1 #11 NONE
[ 0.088833] Hardware name: riscv-virtio,qemu (DT)
[ 0.088948] epc : __virt_to_phys+0xae/0xe8
[ 0.089001] ra : __virt_to_phys+0xae/0xe8
[ 0.089037] epc : ffffffff80021eaa ra : ffffffff80021eaa sp : ff2000000004bbc0
[ 0.089057] gp : ffffffff817f49c0 tp : ff60000001d60000 t0 : 5f6f745f74726976
[ 0.089076] t1 : 0000000000000076 t2 : 705f6f745f747269 s0 : ff2000000004bbe0
[ 0.089095] s1 : ff2000000001d080 a0 : 0000000000000000 a1 : 0000000000000000
[ 0.089113] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[ 0.089131] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[ 0.089155] s2 : ffffffff8130dc00 s3 : 0000000000000001 s4 : 0000000000000001
[ 0.089174] s5 : ffffffff8185eff8 s6 : ff2000007f1eb000 s7 : ffffffff8002a2ec
[ 0.089193] s8 : 0000000000000001 s9 : 0000000000000001 s10: 0000000000000000
[ 0.089211] s11: 0000000000000000 t3 : ffffffff8180a9f7 t4 : ffffffff8180a9f7
[ 0.089960] t5 : ffffffff8180a9f8 t6 : ff2000000004b9d8
[ 0.089984] status: 0000000200000120 badaddr: ffffffff80021eaa cause: 0000000000000003
[ 0.090101] [<ffffffff80021eaa>] __virt_to_phys+0xae/0xe8
[ 0.090228] [<ffffffff8001d796>] sbi_cpu_start+0x6e/0xe8
[ 0.090247] [<ffffffff8001a5da>] __cpu_up+0x1e/0x8c
[ 0.090260] [<ffffffff8002a32e>] bringup_cpu+0x42/0x258
[ 0.090277] [<ffffffff8002914c>] cpuhp_invoke_callback+0xe0/0x40c
[ 0.090292] [<ffffffff800294e0>] __cpuhp_invoke_callback_range+0x68/0xfc
[ 0.090320] [<ffffffff8002a96a>] _cpu_up+0x11a/0x244
[ 0.090334] [<ffffffff8002aae6>] cpu_up+0x52/0x90
[ 0.090384] [<ffffffff80c09350>] bringup_nonboot_cpus+0x78/0x118
[ 0.090411] [<ffffffff80c11060>] smp_init+0x34/0xb8
[ 0.090425] [<ffffffff80c01220>] kernel_init_freeable+0x148/0x2e4
[ 0.090442] [<ffffffff80b83802>] kernel_init+0x1e/0x14c
[ 0.090455] [<ffffffff800124ca>] ret_from_fork_kernel+0xe/0xf0
[ 0.090471] [<ffffffff80b8d9c2>] ret_from_fork_kernel_asm+0x16/0x18
[ 0.090560] ---[ end trace 0000000000000000 ]---
[ 1.179875] CPU1: failed to come online
[ 1.190324] smp: Brought up 1 node, 1 CPU
Cc: stable@vger.kernel.org
Reported-by: Han Gao <rabenda.cn@gmail.com>
Fixes: 6b9f29b81b15 ("riscv: Enable pcpu page first chunk allocator")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://lore.kernel.org/r/20250624-riscv-hsm-boot-data-array-v1-1-50b5eeafbe61@iscas.ac.cn
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
Top of the kernel thread stack should be reserved for pt_regs. However
this is not the case for the idle threads of the secondary boot harts.
Their stacks overlap with their pt_regs, so both may get corrupted.
Similar issue has been fixed for the primary hart, see c7cdd96eca28
("riscv: prevent stack corruption by reserving task_pt_regs(p) early").
However that fix was not propagated to the secondary harts. The problem
has been noticed in some CPU hotplug tests with V enabled. The function
smp_callin stored several registers on stack, corrupting top of pt_regs
structure including status field. As a result, kernel attempted to save
or restore inexistent V context.
Fixes: 9a2451f18663 ("RISC-V: Avoid using per cpu array for ordered booting")
Fixes: 2875fe056156 ("RISC-V: Add cpu_ops and modify default booting method")
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240523084327.2013211-1-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
name is not used anywhere at all. cpu_prepare and cpu_disable do nothing
and always return 0 if implemented.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231121234736.3489608-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The hartid can be a 64bit value on RV64 platforms.
Modify the hartid variable type to unsigned long so that it can
hold 64bit value on RV64 platforms.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220527051743.2829940-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The per cpu boot data is only used within the cpu_ops_sbi.c. It can
be delcared as static.
Fixes: 9a2451f18663 ("RISC-V: Avoid using per cpu array for ordered booting")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
We add defines related to SBI HSM suspend call and also update HSM states
naming as-per the latest SBI specification.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Currently both order booting and spinwait approach uses a per cpu
array to update stack & task pointer. This approach will not work for the
following cases.
1. If NR_CPUs are configured to be less than highest hart id.
2. A platform has sparse hartid.
This issue can be fixed for ordered booting as the booting cpu brings up
one cpu at a time using SBI HSM extension which has opaque parameter
that is unused until now.
Introduce a common secondary boot data structure that can store the stack
and task pointer. Secondary harts will use this data while booting up
to setup the sp & tp.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch enable support for cpu hotplug in RISC-V. It uses SBI HSM
extension to online/offline any hart. As a result, the harts are
returned to firmware once they are offline. If the harts are brought
online afterwards, they re-enter Linux kernel as if a secondary hart
booted for the first time. All booting requirements are honored during
this process.
Tested both on QEMU and HighFive Unleashed board with. Test result follows.
---------------------------------------------------
Offline cpu 2
---------------------------------------------------
$ echo 0 > /sys/devices/system/cpu/cpu2/online
[ 32.828684] CPU2: off
$ cat /proc/cpuinfo
processor : 0
hart : 0
isa : rv64imafdcsu
mmu : sv48
processor : 1
hart : 1
isa : rv64imafdcsu
mmu : sv48
processor : 3
hart : 3
isa : rv64imafdcsu
mmu : sv48
processor : 4
hart : 4
isa : rv64imafdcsu
mmu : sv48
processor : 5
hart : 5
isa : rv64imafdcsu
mmu : sv48
processor : 6
hart : 6
isa : rv64imafdcsu
mmu : sv48
processor : 7
hart : 7
isa : rv64imafdcsu
mmu : sv48
---------------------------------------------------
online cpu 2
---------------------------------------------------
$ echo 1 > /sys/devices/system/cpu/cpu2/online
$ cat /proc/cpuinfo
processor : 0
hart : 0
isa : rv64imafdcsu
mmu : sv48
processor : 1
hart : 1
isa : rv64imafdcsu
mmu : sv48
processor : 2
hart : 2
isa : rv64imafdcsu
mmu : sv48
processor : 3
hart : 3
isa : rv64imafdcsu
mmu : sv48
processor : 4
hart : 4
isa : rv64imafdcsu
mmu : sv48
processor : 5
hart : 5
isa : rv64imafdcsu
mmu : sv48
processor : 6
hart : 6
isa : rv64imafdcsu
mmu : sv48
processor : 7
hart : 7
isa : rv64imafdcsu
mmu : sv48
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
|
|
Currently, all harts have to jump Linux in RISC-V. This complicates the
multi-stage boot process as every transient stage also has to ensure all
harts enter to that stage and jump to Linux afterwards. It also obstructs
a clean Kexec implementation.
SBI HSM extension provides alternate solutions where only a single hart
need to boot and enter Linux. The booting hart can bring up secondary
harts one by one afterwards.
Add SBI HSM based cpu_ops that implements an ordered booting method in
RISC-V. This change is also backward compatible with older firmware not
implementing HSM extension. If a latest kernel is used with older
firmware, it will continue to use the default spinning booting method.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|