Merge branch 'share-user-memory-to-bpf-program-through-task-storage-map'

Martin KaFai Lau says: ==================== Share user memory to BPF program through task storage map From: Martin KaFai Lau <martin.lau@kernel.org> It is the v6 of this series. Starting from v5, it is a continuation work of the RFC v4. Changes in v6: 1. In patch 1, reject t->size == 0 in btf_check_and_fixup_fields. Reject a uptr pointing to an empty struct. A test is added to patch 12 to test this case. 2. In patch 6, when checking if the uptr struct spans across pages, there was an off by one error in calculating the "end" such that the uptr will be rejected by error if the object is located exactly at the end of a page. This is fixed by adding t->size "- 1" to "start". A test is added to patch 9 to test this case. 3. In patch 6, check for PageHighMem(page) and return -EOPNOTSUPP. The 32 bit arch jit is missing other crucial bpf features (e.g. kfunc). Patch 6 commit message has been updated to include this change. 4. The selftests are cleaned up such that "struct user_data *dummy_data" global ptr is used instead of the whole "struct user_data dummy_data" object. Still a hack to avoid generating fwd btf type for the uptr struct but somewhat lighter than a full blown global object. Changes in v5: 1. The original patch 1 and patch 2 are combined. 2. Patch 3, 4, and 5 are new. They get the bpf_local_storage ready to handle the __uptr in the map_value. 3. Patch 6 is mostly new, so I reset the sob. 4. There are some changes in the carry over patch 1 and 2 also. They are mentioned at the individual patch. 5. More tests are added. The following is the original cover letter and the earlier change log. The bpf prog example has been removed. Please find a similar example in the selftests task_ls_uptr.c. ~~~~~~~~ Some of BPF schedulers (sched_ext) need hints from user programs to do a better job. For example, a scheduler can handle a task in a different way if it knows a task is doing GC. So, we need an efficient way to share the information between user programs and BPF programs. Sharing memory between user programs and BPF programs is what this patchset does. == REQUIREMENT == This patchset enables every task in every process to share a small chunk of memory of it's own with a BPF scheduler. So, they can update the hints without expensive overhead of syscalls. It also wants every task sees only the data/memory belong to the task/or the task's process. == DESIGN == This patchset enables BPF prorams to embed __uptr; uptr in the values of task storage maps. A uptr field can only be set by user programs by updating map element value through a syscall. A uptr points to a block of memory allocated by the user program updating the element value. The memory will be pinned to ensure it staying in the core memory and to avoid a page fault when the BPF program accesses it. Please see the selftests task_ls_uptr.c for an example. == MEMORY == In order to use memory efficiently, we don't want to pin a large number of pages. To archieve that, user programs should collect the memory blocks pointed by uptrs together to share memory pages if possible. It avoid the situation that pin one page for each thread in a process. Instead, we can have several threads pointing their uptrs to the same page but with different offsets. Although it is not necessary, avoiding the memory pointed by an uptr crossing the boundary of a page can prevent an additional mapping in the kernel address space. == RESTRICT == The memory pointed by a uptr should reside in one memory page. Crossing multi-pages is not supported at the moment. Only task storage map have been supported at the moment. The values of uptrs can only be updated by user programs through syscalls. bpf_map_lookup_elem() from userspace returns zeroed values for uptrs to prevent leaking information of the kernel. --- Changes from v3: - Merge part 4 and 5 as the new part 4 in order to cease the warning of unused functions from CI. Changes from v1: - Rename BPF_KPTR_USER to BPF_UPTR. - Restrict uptr to one page. - Mark uptr with PTR_TO_MEM | PTR_MAY_BE_NULL and with the size of the target type. - Move uptr away from bpf_obj_memcpy() by introducing bpf_obj_uptrcpy() and copy_map_uptr_locked(). - Remove the BPF_FROM_USER flag. - Align the meory pointed by an uptr in the test case. Remove the uptr of mmapped memory. Kui-Feng Lee (4): bpf: Support __uptr type tag in BTF bpf: Handle BPF_UPTR in verifier libbpf: define __uptr. selftests/bpf: Some basic __uptr tests ==================== Link: https://lore.kernel.org/r/20241023234759.860539-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
author: Alexei Starovoitov <ast@kernel.org> 2024-10-24 10:26:00 -0700
committer: Alexei Starovoitov <ast@kernel.org> 2024-10-24 10:26:00 -0700
commit: c6fb8030b4baa01c850f99fc6da051b1017edc46 (patch)
tree: 160ed2ab568f0fc513ccda0dfa19b08dfd00f4a1 /kernel/bpf/btf.c
parent: 39b8ab1519687054769bc07feb97821fc40f56e2 (diff)
parent: bd5879a6fe4be407bf36c212cd91ed1e4485a6f9 (diff)
1 files changed, 29 insertions, 5 deletions
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 13dd1fa1d1b9..76cafff2d99c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3334,7 +3334,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
 }
 
 static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
-			 u32 off, int sz, struct btf_field_info *info)
+			 u32 off, int sz, struct btf_field_info *info, u32 field_mask)
 {
 	enum btf_field_type type;
 	u32 res_id;
@@ -3358,9 +3358,14 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 		type = BPF_KPTR_REF;
 	else if (!strcmp("percpu_kptr", __btf_name_by_offset(btf, t->name_off)))
 		type = BPF_KPTR_PERCPU;
+	else if (!strcmp("uptr", __btf_name_by_offset(btf, t->name_off)))
+		type = BPF_UPTR;
 	else
 		return -EINVAL;
 
+	if (!(type & field_mask))
+		return BTF_FIELD_IGNORE;
+
 	/* Get the base type */
 	t = btf_type_skip_modifiers(btf, t->type, &res_id);
 	/* Only pointer to struct is allowed */
@@ -3502,7 +3507,7 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
 	field_mask_test_name(BPF_REFCOUNT,  "bpf_refcount");
 
 	/* Only return BPF_KPTR when all other types with matchable names fail */
-	if (field_mask & BPF_KPTR && !__btf_type_is_struct(var_type)) {
+	if (field_mask & (BPF_KPTR | BPF_UPTR) && !__btf_type_is_struct(var_type)) {
 		type = BPF_KPTR_REF;
 		goto end;
 	}
@@ -3535,6 +3540,7 @@ static int btf_repeat_fields(struct btf_field_info *info,
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 		case BPF_KPTR_PERCPU:
+		case BPF_UPTR:
 		case BPF_LIST_HEAD:
 		case BPF_RB_ROOT:
 			break;
@@ -3661,8 +3667,9 @@ static int btf_find_field_one(const struct btf *btf,
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 	case BPF_KPTR_PERCPU:
+	case BPF_UPTR:
 		ret = btf_find_kptr(btf, var_type, off, sz,
-				    info_cnt ? &info[0] : &tmp);
+				    info_cnt ? &info[0] : &tmp, field_mask);
 		if (ret < 0)
 			return ret;
 		break;
@@ -3985,6 +3992,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 		case BPF_KPTR_PERCPU:
+		case BPF_UPTR:
 			ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
 			if (ret < 0)
 				goto end;
@@ -4044,12 +4052,28 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 	 * Hence we only need to ensure that bpf_{list_head,rb_root} ownership
 	 * does not form cycles.
 	 */
-	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_GRAPH_ROOT))
+	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & (BPF_GRAPH_ROOT | BPF_UPTR)))
 		return 0;
 	for (i = 0; i < rec->cnt; i++) {
 		struct btf_struct_meta *meta;
+		const struct btf_type *t;
 		u32 btf_id;
 
+		if (rec->fields[i].type == BPF_UPTR) {
+			/* The uptr only supports pinning one page and cannot
+			 * point to a kernel struct
+			 */
+			if (btf_is_kernel(rec->fields[i].kptr.btf))
+				return -EINVAL;
+			t = btf_type_by_id(rec->fields[i].kptr.btf,
+					   rec->fields[i].kptr.btf_id);
+			if (!t->size)
+				return -EINVAL;
+			if (t->size > PAGE_SIZE)
+				return -E2BIG;
+			continue;
+		}
+
 		if (!(rec->fields[i].type & BPF_GRAPH_ROOT))
 			continue;
 		btf_id = rec->fields[i].graph_root.value_btf_id;
@@ -5560,7 +5584,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 			goto free_aof;
 		}
 
-		ret = btf_find_kptr(btf, t, 0, 0, &tmp);
+		ret = btf_find_kptr(btf, t, 0, 0, &tmp, BPF_KPTR);
 		if (ret != BTF_FIELD_FOUND)
 			continue;
author	Alexei Starovoitov <ast@kernel.org>	2024-10-24 10:26:00 -0700
committer	Alexei Starovoitov <ast@kernel.org>	2024-10-24 10:26:00 -0700
commit	c6fb8030b4baa01c850f99fc6da051b1017edc46 (patch)
tree	160ed2ab568f0fc513ccda0dfa19b08dfd00f4a1 /kernel/bpf/btf.c
parent	39b8ab1519687054769bc07feb97821fc40f56e2 (diff)
parent	bd5879a6fe4be407bf36c212cd91ed1e4485a6f9 (diff)