diff options
Diffstat (limited to 'Documentation/arch/riscv/cmodx.rst')
-rw-r--r-- | Documentation/arch/riscv/cmodx.rst | 130 |
1 files changed, 130 insertions, 0 deletions
diff --git a/Documentation/arch/riscv/cmodx.rst b/Documentation/arch/riscv/cmodx.rst new file mode 100644 index 000000000000..40ba53bed5df --- /dev/null +++ b/Documentation/arch/riscv/cmodx.rst @@ -0,0 +1,130 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================================================== +Concurrent Modification and Execution of Instructions (CMODX) for RISC-V Linux +============================================================================== + +CMODX is a programming technique where a program executes instructions that were +modified by the program itself. Instruction storage and the instruction cache +(icache) are not guaranteed to be synchronized on RISC-V hardware. Therefore, the +program must enforce its own synchronization with the unprivileged fence.i +instruction. + +CMODX in the Kernel Space +------------------------- + +Dynamic ftrace +--------------------- + +Essentially, dynamic ftrace directs the control flow by inserting a function +call at each patchable function entry, and patches it dynamically at runtime to +enable or disable the redirection. In the case of RISC-V, 2 instructions, +AUIPC + JALR, are required to compose a function call. However, it is impossible +to patch 2 instructions and expect that a concurrent read-side executes them +without a race condition. This series makes atmoic code patching possible in +RISC-V ftrace. Kernel preemption makes things even worse as it allows the old +state to persist across the patching process with stop_machine(). + +In order to get rid of stop_machine() and run dynamic ftrace with full kernel +preemption, we partially initialize each patchable function entry at boot-time, +setting the first instruction to AUIPC, and the second to NOP. Now, atmoic +patching is possible because the kernel only has to update one instruction. +According to Ziccif, as long as an instruction is naturally aligned, the ISA +guarantee an atomic update. + +By fixing down the first instruction, AUIPC, the range of the ftrace trampoline +is limited to +-2K from the predetermined target, ftrace_caller, due to the lack +of immediate encoding space in RISC-V. To address the issue, we introduce +CALL_OPS, where an 8B naturally align metadata is added in front of each +pacthable function. The metadata is resolved at the first trampoline, then the +execution can be derect to another custom trampoline. + +CMODX in the User Space +----------------------- + +Though fence.i is an unprivileged instruction, the default Linux ABI prohibits +the use of fence.i in userspace applications. At any point the scheduler may +migrate a task onto a new hart. If migration occurs after the userspace +synchronized the icache and instruction storage with fence.i, the icache on the +new hart will no longer be clean. This is due to the behavior of fence.i only +affecting the hart that it is called on. Thus, the hart that the task has been +migrated to may not have synchronized instruction storage and icache. + +There are two ways to solve this problem: use the riscv_flush_icache() syscall, +or use the ``PR_RISCV_SET_ICACHE_FLUSH_CTX`` prctl() and emit fence.i in +userspace. The syscall performs a one-off icache flushing operation. The prctl +changes the Linux ABI to allow userspace to emit icache flushing operations. + +As an aside, "deferred" icache flushes can sometimes be triggered in the kernel. +At the time of writing, this only occurs during the riscv_flush_icache() syscall +and when the kernel uses copy_to_user_page(). These deferred flushes happen only +when the memory map being used by a hart changes. If the prctl() context caused +an icache flush, this deferred icache flush will be skipped as it is redundant. +Therefore, there will be no additional flush when using the riscv_flush_icache() +syscall inside of the prctl() context. + +prctl() Interface +--------------------- + +Call prctl() with ``PR_RISCV_SET_ICACHE_FLUSH_CTX`` as the first argument. The +remaining arguments will be delegated to the riscv_set_icache_flush_ctx +function detailed below. + +.. kernel-doc:: arch/riscv/mm/cacheflush.c + :identifiers: riscv_set_icache_flush_ctx + +Example usage: + +The following files are meant to be compiled and linked with each other. The +modify_instruction() function replaces an add with 0 with an add with one, +causing the instruction sequence in get_value() to change from returning a zero +to returning a one. + +cmodx.c:: + + #include <stdio.h> + #include <sys/prctl.h> + + extern int get_value(); + extern void modify_instruction(); + + int main() + { + int value = get_value(); + printf("Value before cmodx: %d\n", value); + + // Call prctl before first fence.i is called inside modify_instruction + prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_ON, PR_RISCV_SCOPE_PER_PROCESS); + modify_instruction(); + // Call prctl after final fence.i is called in process + prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_OFF, PR_RISCV_SCOPE_PER_PROCESS); + + value = get_value(); + printf("Value after cmodx: %d\n", value); + return 0; + } + +cmodx.S:: + + .option norvc + + .text + .global modify_instruction + modify_instruction: + lw a0, new_insn + lui a5,%hi(old_insn) + sw a0,%lo(old_insn)(a5) + fence.i + ret + + .section modifiable, "awx" + .global get_value + get_value: + li a0, 0 + old_insn: + addi a0, a0, 0 + ret + + .data + new_insn: + addi a0, a0, 1 |