summaryrefslogtreecommitdiff
path: root/Documentation/bpf
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/bpf')
-rw-r--r--Documentation/bpf/kfuncs.rst19
-rw-r--r--Documentation/bpf/verifier.rst264
2 files changed, 18 insertions, 265 deletions
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ae468b781d31..e38941370b90 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -335,9 +335,26 @@ consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
also be KF_RET_NULL.
+2.4.8 KF_RCU_PROTECTED flag
+---------------------------
+
+The KF_RCU_PROTECTED flag is used to indicate that the kfunc must be invoked in
+an RCU critical section. This is assumed by default in non-sleepable programs,
+and must be explicitly ensured by calling ``bpf_rcu_read_lock`` for sleepable
+ones.
+
+If the kfunc returns a pointer value, this flag also enforces that the returned
+pointer is RCU protected, and can only be used while the RCU critical section is
+active.
+
+The flag is distinct from the ``KF_RCU`` flag, which only ensures that its
+arguments are at least RCU protected pointers. This may transitively imply that
+RCU protection is ensured, but it does not work in cases of kfuncs which require
+RCU protection but do not take RCU protected arguments.
+
.. _KF_deprecated_flag:
-2.4.8 KF_DEPRECATED flag
+2.4.9 KF_DEPRECATED flag
------------------------
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
diff --git a/Documentation/bpf/verifier.rst b/Documentation/bpf/verifier.rst
index 95e6f80a407e..510d15bc697b 100644
--- a/Documentation/bpf/verifier.rst
+++ b/Documentation/bpf/verifier.rst
@@ -347,270 +347,6 @@ However, only the value of register ``r1`` is important to successfully finish
verification. The goal of the liveness tracking algorithm is to spot this fact
and figure out that both states are actually equivalent.
-Data structures
-~~~~~~~~~~~~~~~
-
-Liveness is tracked using the following data structures::
-
- enum bpf_reg_liveness {
- REG_LIVE_NONE = 0,
- REG_LIVE_READ32 = 0x1,
- REG_LIVE_READ64 = 0x2,
- REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
- REG_LIVE_WRITTEN = 0x4,
- REG_LIVE_DONE = 0x8,
- };
-
- struct bpf_reg_state {
- ...
- struct bpf_reg_state *parent;
- ...
- enum bpf_reg_liveness live;
- ...
- };
-
- struct bpf_stack_state {
- struct bpf_reg_state spilled_ptr;
- ...
- };
-
- struct bpf_func_state {
- struct bpf_reg_state regs[MAX_BPF_REG];
- ...
- struct bpf_stack_state *stack;
- }
-
- struct bpf_verifier_state {
- struct bpf_func_state *frame[MAX_CALL_FRAMES];
- struct bpf_verifier_state *parent;
- ...
- }
-
-* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new
- verifier state creation;
-
-* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is
- defined by some instruction verified between this verifier state's parent and
- verifier state itself;
-
-* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot)
- is read by a some child state of this verifier state;
-
-* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid
- processing same verifier state multiple times and for some sanity checks;
-
-* ``->live`` field values are formed by combining ``enum bpf_reg_liveness``
- values using bitwise or.
-
-Register parentage chains
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-In order to propagate information between parent and child states, a *register
-parentage chain* is established. Each register or stack slot is linked to a
-corresponding register or stack slot in its parent state via a ``->parent``
-pointer. This link is established upon state creation in ``is_state_visited()``
-and might be modified by ``set_callee_state()`` called from
-``__check_func_call()``.
-
-The rules for correspondence between registers / stack slots are as follows:
-
-* For the current stack frame, registers and stack slots of the new state are
- linked to the registers and stack slots of the parent state with the same
- indices.
-
-* For the outer stack frames, only callee saved registers (r6-r9) and stack
- slots are linked to the registers and stack slots of the parent state with the
- same indices.
-
-* When function call is processed a new ``struct bpf_func_state`` instance is
- allocated, it encapsulates a new set of registers and stack slots. For this
- new frame, parent links for r6-r9 and stack slots are set to nil, parent links
- for r1-r5 are set to match caller r1-r5 parent links.
-
-This could be illustrated by the following diagram (arrows stand for
-``->parent`` pointers)::
-
- ... ; Frame #0, some instructions
- --- checkpoint #0 ---
- 1 : r6 = 42 ; Frame #0
- --- checkpoint #1 ---
- 2 : call foo() ; Frame #0
- ... ; Frame #1, instructions from foo()
- --- checkpoint #2 ---
- ... ; Frame #1, instructions from foo()
- --- checkpoint #3 ---
- exit ; Frame #1, return from foo()
- 3 : r1 = r6 ; Frame #0 <- current state
-
- +-------------------------------+-------------------------------+
- | Frame #0 | Frame #1 |
- Checkpoint +-------------------------------+-------------------------------+
- #0 | r0 | r1-r5 | r6-r9 | fp-8 ... |
- +-------------------------------+
- ^ ^ ^ ^
- | | | |
- Checkpoint +-------------------------------+
- #1 | r0 | r1-r5 | r6-r9 | fp-8 ... |
- +-------------------------------+
- ^ ^ ^
- |_______|_______|_______________
- | | |
- nil nil | | | nil nil
- | | | | | | |
- Checkpoint +-------------------------------+-------------------------------+
- #2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
- +-------------------------------+-------------------------------+
- ^ ^ ^ ^ ^
- nil nil | | | | |
- | | | | | | |
- Checkpoint +-------------------------------+-------------------------------+
- #3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
- +-------------------------------+-------------------------------+
- ^ ^
- nil nil | |
- | | | |
- Current +-------------------------------+
- state | r0 | r1-r5 | r6-r9 | fp-8 ... |
- +-------------------------------+
- \
- r6 read mark is propagated via these links
- all the way up to checkpoint #1.
- The checkpoint #1 contains a write mark for r6
- because of instruction (1), thus read propagation
- does not reach checkpoint #0 (see section below).
-
-Liveness marks tracking
-~~~~~~~~~~~~~~~~~~~~~~~
-
-For each processed instruction, the verifier tracks read and written registers
-and stack slots. The main idea of the algorithm is that read marks propagate
-back along the state parentage chain until they hit a write mark, which 'screens
-off' earlier states from the read. The information about reads is propagated by
-function ``mark_reg_read()`` which could be summarized as follows::
-
- mark_reg_read(struct bpf_reg_state *state, ...):
- parent = state->parent
- while parent:
- if state->live & REG_LIVE_WRITTEN:
- break
- if parent->live & REG_LIVE_READ64:
- break
- parent->live |= REG_LIVE_READ64
- state = parent
- parent = state->parent
-
-Notes:
-
-* The read marks are applied to the **parent** state while write marks are
- applied to the **current** state. The write mark on a register or stack slot
- means that it is updated by some instruction in the straight-line code leading
- from the parent state to the current state.
-
-* Details about REG_LIVE_READ32 are omitted.
-
-* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
- might override the first parent link. Please refer to the comments in the
- ``propagate_liveness()`` and ``mark_reg_read()`` source code for further
- details.
-
-Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are
-applied conservatively: stack slots are marked as written only if write size
-corresponds to the size of the register, e.g. see function ``save_register_state()``.
-
-Consider the following example::
-
- 0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8
- --- checkpoint #0 ---
- 1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes
- 2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1)
- 3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0)
-
-As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should
-it be otherwise, the algorithm above wouldn't be able to propagate the read mark
-from (3) to checkpoint #0.
-
-Once the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is
-called to update the ``->branches`` counter for each verifier state in a chain
-of parent verifier states. When the ``->branches`` counter reaches zero the
-verifier state becomes a valid entry in a set of cached verifier states.
-
-Each entry of the verifier states cache is post-processed by a function
-``clean_live_states()``. This function marks all registers and stack slots
-without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``.
-Registers/stack slots marked in this way are ignored in function ``stacksafe()``
-called from ``states_equal()`` when a state cache entry is considered for
-equivalence with a current state.
-
-Now it is possible to explain how the example from the beginning of the section
-works::
-
- 0: call bpf_get_prandom_u32()
- 1: r1 = 0
- 2: if r0 == 0 goto +1
- 3: r0 = 1
- --- checkpoint[0] ---
- 4: r0 = r1
- 5: exit
-
-* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }``
- is pushed to states processing queue (pc stands for program counter).
-
-* At instruction #4:
-
- * ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``;
- * ``checkpoint[0].r0`` is marked as written;
- * ``checkpoint[0].r1`` is marked as read;
-
-* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed
- by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a
- read mark and all other registers and stack slots are marked as ``NOT_INIT``
- or ``STACK_INVALID``
-
-* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue
- and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states
- are considered equivalent.
-
-.. _read_marks_for_cache_hits:
-
-Read marks propagation for cache hits
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Another point is the handling of read marks when a previously verified state is
-found in the states cache. Upon cache hit verifier must behave in the same way
-as if the current state was verified to the program exit. This means that all
-read marks, present on registers and stack slots of the cached state, must be
-propagated over the parentage chain of the current state. Example below shows
-why this is important. Function ``propagate_liveness()`` handles this case.
-
-Consider the following state parentage chain (S is a starting state, A-E are
-derived states, -> arrows show which state is derived from which)::
-
- r1 read
- <------------- A[r1] == 0
- C[r1] == 0
- S ---> A ---> B ---> exit E[r1] == 1
- |
- ` ---> C ---> D
- |
- ` ---> E ^
- |___ suppose all these
- ^ states are at insn #Y
- |
- suppose all these
- states are at insn #X
-
-* Chain of states ``S -> A -> B -> exit`` is verified first.
-
-* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is
- propagated up to state ``A``.
-
-* When chain of states ``C -> D`` is verified the state ``D`` turns out to be
- equivalent to state ``B``.
-
-* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state
- ``C`` might get mistakenly marked as equivalent to state ``E`` even though
- values for register ``r1`` differ between ``C`` and ``E``.
-
Understanding eBPF verifier messages
====================================