diff options
Diffstat (limited to 'Documentation/bpf')
-rw-r--r-- | Documentation/bpf/kfuncs.rst | 19 | ||||
-rw-r--r-- | Documentation/bpf/verifier.rst | 264 |
2 files changed, 18 insertions, 265 deletions
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index ae468b781d31..e38941370b90 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -335,9 +335,26 @@ consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely also be KF_RET_NULL. +2.4.8 KF_RCU_PROTECTED flag +--------------------------- + +The KF_RCU_PROTECTED flag is used to indicate that the kfunc must be invoked in +an RCU critical section. This is assumed by default in non-sleepable programs, +and must be explicitly ensured by calling ``bpf_rcu_read_lock`` for sleepable +ones. + +If the kfunc returns a pointer value, this flag also enforces that the returned +pointer is RCU protected, and can only be used while the RCU critical section is +active. + +The flag is distinct from the ``KF_RCU`` flag, which only ensures that its +arguments are at least RCU protected pointers. This may transitively imply that +RCU protection is ensured, but it does not work in cases of kfuncs which require +RCU protection but do not take RCU protected arguments. + .. _KF_deprecated_flag: -2.4.8 KF_DEPRECATED flag +2.4.9 KF_DEPRECATED flag ------------------------ The KF_DEPRECATED flag is used for kfuncs which are scheduled to be diff --git a/Documentation/bpf/verifier.rst b/Documentation/bpf/verifier.rst index 95e6f80a407e..510d15bc697b 100644 --- a/Documentation/bpf/verifier.rst +++ b/Documentation/bpf/verifier.rst @@ -347,270 +347,6 @@ However, only the value of register ``r1`` is important to successfully finish verification. The goal of the liveness tracking algorithm is to spot this fact and figure out that both states are actually equivalent. -Data structures -~~~~~~~~~~~~~~~ - -Liveness is tracked using the following data structures:: - - enum bpf_reg_liveness { - REG_LIVE_NONE = 0, - REG_LIVE_READ32 = 0x1, - REG_LIVE_READ64 = 0x2, - REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64, - REG_LIVE_WRITTEN = 0x4, - REG_LIVE_DONE = 0x8, - }; - - struct bpf_reg_state { - ... - struct bpf_reg_state *parent; - ... - enum bpf_reg_liveness live; - ... - }; - - struct bpf_stack_state { - struct bpf_reg_state spilled_ptr; - ... - }; - - struct bpf_func_state { - struct bpf_reg_state regs[MAX_BPF_REG]; - ... - struct bpf_stack_state *stack; - } - - struct bpf_verifier_state { - struct bpf_func_state *frame[MAX_CALL_FRAMES]; - struct bpf_verifier_state *parent; - ... - } - -* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new - verifier state creation; - -* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is - defined by some instruction verified between this verifier state's parent and - verifier state itself; - -* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot) - is read by a some child state of this verifier state; - -* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid - processing same verifier state multiple times and for some sanity checks; - -* ``->live`` field values are formed by combining ``enum bpf_reg_liveness`` - values using bitwise or. - -Register parentage chains -~~~~~~~~~~~~~~~~~~~~~~~~~ - -In order to propagate information between parent and child states, a *register -parentage chain* is established. Each register or stack slot is linked to a -corresponding register or stack slot in its parent state via a ``->parent`` -pointer. This link is established upon state creation in ``is_state_visited()`` -and might be modified by ``set_callee_state()`` called from -``__check_func_call()``. - -The rules for correspondence between registers / stack slots are as follows: - -* For the current stack frame, registers and stack slots of the new state are - linked to the registers and stack slots of the parent state with the same - indices. - -* For the outer stack frames, only callee saved registers (r6-r9) and stack - slots are linked to the registers and stack slots of the parent state with the - same indices. - -* When function call is processed a new ``struct bpf_func_state`` instance is - allocated, it encapsulates a new set of registers and stack slots. For this - new frame, parent links for r6-r9 and stack slots are set to nil, parent links - for r1-r5 are set to match caller r1-r5 parent links. - -This could be illustrated by the following diagram (arrows stand for -``->parent`` pointers):: - - ... ; Frame #0, some instructions - --- checkpoint #0 --- - 1 : r6 = 42 ; Frame #0 - --- checkpoint #1 --- - 2 : call foo() ; Frame #0 - ... ; Frame #1, instructions from foo() - --- checkpoint #2 --- - ... ; Frame #1, instructions from foo() - --- checkpoint #3 --- - exit ; Frame #1, return from foo() - 3 : r1 = r6 ; Frame #0 <- current state - - +-------------------------------+-------------------------------+ - | Frame #0 | Frame #1 | - Checkpoint +-------------------------------+-------------------------------+ - #0 | r0 | r1-r5 | r6-r9 | fp-8 ... | - +-------------------------------+ - ^ ^ ^ ^ - | | | | - Checkpoint +-------------------------------+ - #1 | r0 | r1-r5 | r6-r9 | fp-8 ... | - +-------------------------------+ - ^ ^ ^ - |_______|_______|_______________ - | | | - nil nil | | | nil nil - | | | | | | | - Checkpoint +-------------------------------+-------------------------------+ - #2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... | - +-------------------------------+-------------------------------+ - ^ ^ ^ ^ ^ - nil nil | | | | | - | | | | | | | - Checkpoint +-------------------------------+-------------------------------+ - #3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... | - +-------------------------------+-------------------------------+ - ^ ^ - nil nil | | - | | | | - Current +-------------------------------+ - state | r0 | r1-r5 | r6-r9 | fp-8 ... | - +-------------------------------+ - \ - r6 read mark is propagated via these links - all the way up to checkpoint #1. - The checkpoint #1 contains a write mark for r6 - because of instruction (1), thus read propagation - does not reach checkpoint #0 (see section below). - -Liveness marks tracking -~~~~~~~~~~~~~~~~~~~~~~~ - -For each processed instruction, the verifier tracks read and written registers -and stack slots. The main idea of the algorithm is that read marks propagate -back along the state parentage chain until they hit a write mark, which 'screens -off' earlier states from the read. The information about reads is propagated by -function ``mark_reg_read()`` which could be summarized as follows:: - - mark_reg_read(struct bpf_reg_state *state, ...): - parent = state->parent - while parent: - if state->live & REG_LIVE_WRITTEN: - break - if parent->live & REG_LIVE_READ64: - break - parent->live |= REG_LIVE_READ64 - state = parent - parent = state->parent - -Notes: - -* The read marks are applied to the **parent** state while write marks are - applied to the **current** state. The write mark on a register or stack slot - means that it is updated by some instruction in the straight-line code leading - from the parent state to the current state. - -* Details about REG_LIVE_READ32 are omitted. - -* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`) - might override the first parent link. Please refer to the comments in the - ``propagate_liveness()`` and ``mark_reg_read()`` source code for further - details. - -Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are -applied conservatively: stack slots are marked as written only if write size -corresponds to the size of the register, e.g. see function ``save_register_state()``. - -Consider the following example:: - - 0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8 - --- checkpoint #0 --- - 1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes - 2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1) - 3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0) - -As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should -it be otherwise, the algorithm above wouldn't be able to propagate the read mark -from (3) to checkpoint #0. - -Once the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is -called to update the ``->branches`` counter for each verifier state in a chain -of parent verifier states. When the ``->branches`` counter reaches zero the -verifier state becomes a valid entry in a set of cached verifier states. - -Each entry of the verifier states cache is post-processed by a function -``clean_live_states()``. This function marks all registers and stack slots -without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``. -Registers/stack slots marked in this way are ignored in function ``stacksafe()`` -called from ``states_equal()`` when a state cache entry is considered for -equivalence with a current state. - -Now it is possible to explain how the example from the beginning of the section -works:: - - 0: call bpf_get_prandom_u32() - 1: r1 = 0 - 2: if r0 == 0 goto +1 - 3: r0 = 1 - --- checkpoint[0] --- - 4: r0 = r1 - 5: exit - -* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }`` - is pushed to states processing queue (pc stands for program counter). - -* At instruction #4: - - * ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``; - * ``checkpoint[0].r0`` is marked as written; - * ``checkpoint[0].r1`` is marked as read; - -* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed - by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a - read mark and all other registers and stack slots are marked as ``NOT_INIT`` - or ``STACK_INVALID`` - -* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue - and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states - are considered equivalent. - -.. _read_marks_for_cache_hits: - -Read marks propagation for cache hits -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Another point is the handling of read marks when a previously verified state is -found in the states cache. Upon cache hit verifier must behave in the same way -as if the current state was verified to the program exit. This means that all -read marks, present on registers and stack slots of the cached state, must be -propagated over the parentage chain of the current state. Example below shows -why this is important. Function ``propagate_liveness()`` handles this case. - -Consider the following state parentage chain (S is a starting state, A-E are -derived states, -> arrows show which state is derived from which):: - - r1 read - <------------- A[r1] == 0 - C[r1] == 0 - S ---> A ---> B ---> exit E[r1] == 1 - | - ` ---> C ---> D - | - ` ---> E ^ - |___ suppose all these - ^ states are at insn #Y - | - suppose all these - states are at insn #X - -* Chain of states ``S -> A -> B -> exit`` is verified first. - -* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is - propagated up to state ``A``. - -* When chain of states ``C -> D`` is verified the state ``D`` turns out to be - equivalent to state ``B``. - -* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state - ``C`` might get mistakenly marked as equivalent to state ``E`` even though - values for register ``r1`` differ between ``C`` and ``E``. - Understanding eBPF verifier messages ==================================== |