summaryrefslogtreecommitdiff
path: root/Documentation/bpf/kfuncs.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/bpf/kfuncs.rst')
-rw-r--r--Documentation/bpf/kfuncs.rst182
1 files changed, 83 insertions, 99 deletions
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ca96ef3f6896..ea2516374d92 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -100,6 +100,23 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
+2.2.2 __uninit Annotation
+-------------------------
+
+This annotation is used to indicate that the argument will be treated as
+uninitialized.
+
+An example is given below::
+
+ __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit)
+ {
+ ...
+ }
+
+Here, the dynptr will be treated as an uninitialized dynptr. Without this
+annotation, the verifier will reject the program if the dynptr passed in is
+not initialized.
+
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
@@ -162,20 +179,12 @@ both are orthogonal to each other.
---------------------
The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
-passed in to it. There can be only one referenced pointer that can be passed in.
-All copies of the pointer being released are invalidated as a result of invoking
-kfunc with this flag.
-
-2.4.4 KF_KPTR_GET flag
-----------------------
-
-The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
-as a pointer to kptr, safely increments the refcount of the object it points to,
-and returns a reference to the user. The rest of the arguments may be normal
-arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
-KF_ACQUIRE and KF_RET_NULL flags.
+passed in to it. There can be only one referenced pointer that can be passed
+in. All copies of the pointer being released are invalidated as a result of
+invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the
+protection afforded by the KF_TRUSTED_ARGS flag described below.
-2.4.5 KF_TRUSTED_ARGS flag
+2.4.4 KF_TRUSTED_ARGS flag
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
@@ -187,7 +196,7 @@ exception described below).
There are two types of pointers to kernel objects which are considered "valid":
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
-2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
+2. Pointers which were returned from a KF_ACQUIRE kfunc.
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
@@ -214,13 +223,13 @@ In other words, you must:
2. Specify the type and name of the trusted nested field. This field must match
the field in the original type definition exactly.
-2.4.6 KF_SLEEPABLE flag
+2.4.5 KF_SLEEPABLE flag
-----------------------
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
-2.4.7 KF_DESTRUCTIVE flag
+2.4.6 KF_DESTRUCTIVE flag
--------------------------
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
@@ -229,18 +238,20 @@ rebooting or panicking. Due to this additional restrictions apply to these
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
added later.
-2.4.8 KF_RCU flag
+2.4.7 KF_RCU flag
-----------------
-The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
-When used together with KF_ACQUIRE, it indicates the kfunc should have a
-single argument which must be a trusted argument or a MEM_RCU pointer.
-The argument may have reference count of 0 and the kfunc must take this
-into consideration.
+The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
+KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
+that the objects are valid and there is no use-after-free. The pointers are not
+NULL, but the object's refcount could have reached zero. The kfuncs need to
+consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
+pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
+also be KF_RET_NULL.
.. _KF_deprecated_flag:
-2.4.9 KF_DEPRECATED flag
+2.4.8 KF_DEPRECATED flag
------------------------
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
@@ -451,13 +462,50 @@ struct_ops callback arg. For example:
struct task_struct *acquired;
acquired = bpf_task_acquire(task);
+ if (acquired)
+ /*
+ * In a typical program you'd do something like store
+ * the task in a map, and the map will automatically
+ * release it later. Here, we release it manually.
+ */
+ bpf_task_release(acquired);
+ return 0;
+ }
+
+
+References acquired on ``struct task_struct *`` objects are RCU protected.
+Therefore, when in an RCU read region, you can obtain a pointer to a task
+embedded in a map value without having to acquire a reference:
+
+.. code-block:: c
+
+ #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+ private(TASK) static struct task_struct *global;
+
+ /**
+ * A trivial example showing how to access a task stored
+ * in a map using RCU.
+ */
+ SEC("tp_btf/task_newtask")
+ int BPF_PROG(task_rcu_read_example, struct task_struct *task, u64 clone_flags)
+ {
+ struct task_struct *local_copy;
+
+ bpf_rcu_read_lock();
+ local_copy = global;
+ if (local_copy)
+ /*
+ * We could also pass local_copy to kfuncs or helper functions here,
+ * as we're guaranteed that local_copy will be valid until we exit
+ * the RCU read region below.
+ */
+ bpf_printk("Global task %s is valid", local_copy->comm);
+ else
+ bpf_printk("No global task found");
+ bpf_rcu_read_unlock();
+
+ /* At this point we can no longer reference local_copy. */
- /*
- * In a typical program you'd do something like store
- * the task in a map, and the map will automatically
- * release it later. Here, we release it manually.
- */
- bpf_task_release(acquired);
return 0;
}
@@ -515,80 +563,16 @@ bpf_task_release() respectively, so we won't provide examples for them.
----
-You may also acquire a reference to a ``struct cgroup`` kptr that's already
-stored in a map using bpf_cgroup_kptr_get():
+Other kfuncs available for interacting with ``struct cgroup *`` objects are
+bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access
+the ancestor of a cgroup and find a cgroup by its ID, respectively. Both
+return a cgroup kptr.
.. kernel-doc:: kernel/bpf/helpers.c
- :identifiers: bpf_cgroup_kptr_get
-
-Here's an example of how it can be used:
-
-.. code-block:: c
-
- /* struct containing the struct task_struct kptr which is actually stored in the map. */
- struct __cgroups_kfunc_map_value {
- struct cgroup __kptr_ref * cgroup;
- };
-
- /* The map containing struct __cgroups_kfunc_map_value entries. */
- struct {
- __uint(type, BPF_MAP_TYPE_HASH);
- __type(key, int);
- __type(value, struct __cgroups_kfunc_map_value);
- __uint(max_entries, 1);
- } __cgroups_kfunc_map SEC(".maps");
-
- /* ... */
-
- /**
- * A simple example tracepoint program showing how a
- * struct cgroup kptr that is stored in a map can
- * be acquired using the bpf_cgroup_kptr_get() kfunc.
- */
- SEC("tp_btf/cgroup_mkdir")
- int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
- {
- struct cgroup *kptr;
- struct __cgroups_kfunc_map_value *v;
- s32 id = cgrp->self.id;
-
- /* Assume a cgroup kptr was previously stored in the map. */
- v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
- if (!v)
- return -ENOENT;
-
- /* Acquire a reference to the cgroup kptr that's already stored in the map. */
- kptr = bpf_cgroup_kptr_get(&v->cgroup);
- if (!kptr)
- /* If no cgroup was present in the map, it's because
- * we're racing with another CPU that removed it with
- * bpf_kptr_xchg() between the bpf_map_lookup_elem()
- * above, and our call to bpf_cgroup_kptr_get().
- * bpf_cgroup_kptr_get() internally safely handles this
- * race, and will return NULL if the task is no longer
- * present in the map by the time we invoke the kfunc.
- */
- return -EBUSY;
-
- /* Free the reference we just took above. Note that the
- * original struct cgroup kptr is still in the map. It will
- * be freed either at a later time if another context deletes
- * it from the map, or automatically by the BPF subsystem if
- * it's still present when the map is destroyed.
- */
- bpf_cgroup_release(kptr);
-
- return 0;
- }
-
-----
-
-Another kfunc available for interacting with ``struct cgroup *`` objects is
-bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
-and return it as a cgroup kptr.
+ :identifiers: bpf_cgroup_ancestor
.. kernel-doc:: kernel/bpf/helpers.c
- :identifiers: bpf_cgroup_ancestor
+ :identifiers: bpf_cgroup_from_id
Eventually, BPF should be updated to allow this to happen with a normal memory
load in the program itself. This is currently not possible without more work in