summaryrefslogtreecommitdiff
path: root/kernel/bpf
AgeCommit message (Collapse)Author
2018-12-05bpf: Improve the info.func_info and info.func_info_rec_size behaviorMartin KaFai Lau
1) When bpf_dump_raw_ok() == false and the kernel can provide >=1 func_info to the userspace, the current behavior is setting the info.func_info_cnt to 0 instead of setting info.func_info to 0. It is different from the behavior in jited_func_lens/nr_jited_func_lens, jited_ksyms/nr_jited_ksyms...etc. This patch fixes it. (i.e. set func_info to 0 instead of func_info_cnt to 0 when bpf_dump_raw_ok() == false). 2) When the userspace passed in info.func_info_cnt == 0, the kernel will set the expected func_info size back to the info.func_info_rec_size. It is a way for the userspace to learn the kernel expected func_info_rec_size introduced in commit 838e96904ff3 ("bpf: Introduce bpf_func_info"). An exception is the kernel expected size is not set when func_info is not available for a bpf_prog. This makes the returned info.func_info_rec_size has different values depending on the returned value of info.func_info_cnt. This patch sets the kernel expected size to info.func_info_rec_size independent of the info.func_info_cnt. 3) The current logic only rejects invalid func_info_rec_size if func_info_cnt is non zero. This patch also rejects invalid nonzero info.func_info_rec_size and not equal to the kernel expected size. 4) Set info.btf_id as long as prog->aux->btf != NULL. That will setup the later copy_to_user() codes look the same as others which then easier to understand and maintain. prog->aux->btf is not NULL only if prog->aux->func_info_cnt > 0. Breaking up info.btf_id from prog->aux->func_info_cnt is needed for the later line info patch anyway. A similar change is made to bpf_get_prog_name(). Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-05bpf: add __weak hook for allocating executable memoryArd Biesheuvel
By default, BPF uses module_alloc() to allocate executable memory, but this is not necessary on all arches and potentially undesirable on some of them. So break out the module_alloc() and module_memfree() calls into __weak functions to allow them to be overridden in arch code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: add per-insn complexity limitAlexei Starovoitov
malicious bpf program may try to force the verifier to remember a lot of distinct verifier states. Put a limit to number of per-insn 'struct bpf_verifier_state'. Note that hitting the limit doesn't reject the program. It potentially makes the verifier do more steps to analyze the program. It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner instead of spending cpu time walking long link list. The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs with slight increase in number of "steps" it takes to successfully verify the programs: before after bpf_lb-DLB_L3.o 1940 1940 bpf_lb-DLB_L4.o 3089 3089 bpf_lb-DUNKNOWN.o 1065 1065 bpf_lxc-DDROP_ALL.o 28052 | 28162 bpf_lxc-DUNKNOWN.o 35487 | 35541 bpf_netdev.o 10864 10864 bpf_overlay.o 6643 6643 bpf_lcx_jit.o 38437 38437 But it also makes malicious program to be rejected in 0.4 seconds vs 6.5 Hence apply this limit to unprivileged programs only. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: improve verifier branch analysisAlexei Starovoitov
pathological bpf programs may try to force verifier to explode in the number of branch states: 20: (d5) if r1 s<= 0x24000028 goto pc+0 21: (b5) if r0 <= 0xe1fa20 goto pc+2 22: (d5) if r1 s<= 0x7e goto pc+0 23: (b5) if r0 <= 0xe880e000 goto pc+0 24: (c5) if r0 s< 0x2100ecf4 goto pc+0 25: (d5) if r1 s<= 0xe880e000 goto pc+1 26: (c5) if r0 s< 0xf4041810 goto pc+0 27: (d5) if r1 s<= 0x1e007e goto pc+0 28: (b5) if r0 <= 0xe86be000 goto pc+0 29: (07) r0 += 16614 30: (c5) if r0 s< 0x6d0020da goto pc+0 31: (35) if r0 >= 0x2100ecf4 goto pc+0 Teach verifier to recognize always taken and always not taken branches. This analysis is already done for == and != comparison. Expand it to all other branches. It also helps real bpf programs to be verified faster: before after bpf_lb-DLB_L3.o 2003 1940 bpf_lb-DLB_L4.o 3173 3089 bpf_lb-DUNKNOWN.o 1080 1065 bpf_lxc-DDROP_ALL.o 29584 28052 bpf_lxc-DUNKNOWN.o 36916 35487 bpf_netdev.o 11188 10864 bpf_overlay.o 6679 6643 bpf_lcx_jit.o 39555 38437 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: check pending signals while verifying programsAlexei Starovoitov
Malicious user space may try to force the verifier to use as much cpu time and memory as possible. Hence check for pending signals while verifying the program. Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN, since the kernel has to release the resources used for program verification. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-02bpf: Fix memleak in aux->func_info and aux->btfMartin KaFai Lau
The aux->func_info and aux->btf are leaked in the error out cases during bpf_prog_load(). This patch fixes it. Fixes: ba64e7d85252 ("bpf: btf: support proper non-jit func info") Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-30bpf: Add BPF_F_ANY_ALIGNMENT.David Miller
Often we want to write tests cases that check things like bad context offset accesses. And one way to do this is to use an odd offset on, for example, a 32-bit load. This unfortunately triggers the alignment checks first on platforms that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS. So the test case see the alignment failure rather than what it was testing for. It is often not completely possible to respect the original intention of the test, or even test the same exact thing, while solving the alignment issue. Another option could have been to check the alignment after the context and other validations are performed by the verifier, but that is a non-trivial change to the verifier. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== bpf-next 2018-11-30 The following pull-request contains BPF updates for your *net-next* tree. (Getting out bit earlier this time to pull in a dependency from bpf.) The main changes are: 1) Add libbpf ABI versioning and document API naming conventions as well as ABI versioning process, from Andrey. 2) Add a new sk_msg_pop_data() helper for sk_msg based BPF programs that is used in conjunction with sk_msg_push_data() for adding / removing meta data to the msg data, from John. 3) Optimize convert_bpf_ld_abs() for 0 offset and fix various lib and testsuite build failures on 32 bit, from David. 4) Make BPF prog dump for !JIT identical to how we dump subprogs when JIT is in use, from Yonghong. 5) Rename btf_get_from_id() to make it more conform with libbpf API naming conventions, from Martin. 6) Add a missing BPF kselftest config item, from Naresh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Trivial conflict in net/core/filter.c, a locally computed 'sdif' is now an argument to the function. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28bpf: btf: check name validity for various typesYonghong Song
This patch added name checking for the following types: . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE, BTF_KIND_CONST, BTF_KIND_RESTRICT: the name must be null . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name is either null or a valid identifier . BTF_KIND_ENUM: the enum type name is either null or a valid identifier; the enumerator name must be a valid identifier. . BTF_KIND_FWD: the name must be a valid identifier . BTF_KIND_TYPEDEF: the name must be a valid identifier For those places a valid name is required, the name must be a valid C identifier. This can be relaxed later if we found use cases for a different (non-C) frontend. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-28bpf: btf: implement btf_name_valid_identifier()Yonghong Song
Function btf_name_valid_identifier() have been implemented in bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO"). Backport this function so later patch can use it. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf: btf: support proper non-jit func infoYonghong Song
Commit 838e96904ff3 ("bpf: Introduce bpf_func_info") added bpf func info support. The userspace is able to get better ksym's for bpf programs with jit, and is able to print out func prototypes. For a program containing func-to-func calls, the existing implementation returns user specified number of function calls and BTF types if jit is enabled. If the jit is not enabled, it only returns the type for the main function. This is undesirable. Interpreter may still be used and we should keep feature identical regardless of whether jit is enabled or not. This patch fixed this discrepancy. Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info") Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addrDaniel Borkmann
Make fetching of the BPF call address from ppc64 JIT generic. ppc64 was using a slightly different variant rather than through the insns' imm field encoding as the target address would not fit into that space. Therefore, the target subprog number was encoded into the insns' offset and fetched through fp->aux->func[off]->bpf_func instead. Given there are other JITs with this issue and the mechanism of fetching the address is JIT-generic, move it into the core as a helper instead. On the JIT side, we get information on whether the retrieved address is a fixed one, that is, not changing through JIT passes, or a dynamic one. For the former, JITs can optimize their imm emission because this doesn't change jump offsets throughout JIT process. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf: btf: fix spelling mistake "Memmber" -> "Member"Colin Ian King
There is a spelling mistake in a btf_verifier_log_member message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26bpf, tags: Fix DEFINE_PER_CPU expansionRustam Kovhaev
Building tags produces warning: ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1" Let's use the same fix as in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), even though it violates the usual code style. Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22bpf: fix integer overflow in queue_stack_mapAlexei Starovoitov
Fix the following issues: - allow queue_stack_map for root only - fix u32 max_entries overflow - disallow value_size == 0 Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Reported-by: Wei Wu <ww9210@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22bpf, lpm: make longest_prefix_match() fasterEric Dumazet
At LPC 2018 in Vancouver, Vlad Dumitrescu mentioned that longest_prefix_match() has a high cost [1]. One reason for that cost is a loop handling one byte at a time. We can handle more bytes at a time, if enough attention is paid to endianness. I was able to remove ~55 % of longest_prefix_match() cpu costs. [1] https://linuxplumbersconf.org/event/2/contributions/88/attachments/76/87/lpc-bpf-2018-shaping.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Dumitrescu <vladum@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20bpf: Introduce bpf_func_infoYonghong Song
This patch added interface to load a program with the following additional information: . prog_btf_fd . func_info, func_info_rec_size and func_info_cnt where func_info will provide function range and type_id corresponding to each function. The func_info_rec_size is introduced in the UAPI to specify struct bpf_func_info size passed from user space. This intends to make bpf_func_info structure growable in the future. If the kernel gets a different bpf_func_info size from userspace, it will try to handle user request with part of bpf_func_info it can understand. In this patch, kernel can understand struct bpf_func_info { __u32 insn_offset; __u32 type_id; }; If user passed a bpf func_info record size of 16 bytes, the kernel can still handle part of records with the above definition. If verifier agrees with function range provided by the user, the bpf_prog ksym for each function will use the func name provided in the type_id, which is supposed to provide better encoding as it is not limited by 16 bytes program name limitation and this is better for bpf program which contains multiple subprograms. The bpf_prog_info interface is also extended to return btf_id, func_info, func_info_rec_size and func_info_cnt to userspace, so userspace can print out the function prototype for each xlated function. The insn_offset in the returned func_info corresponds to the insn offset for xlated functions. With other jit related fields in bpf_prog_info, userspace can also print out function prototypes for each jited function. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTOMartin KaFai Lau
This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO to support the function debug info. BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off) and it is followed by >= 0 'struct bpf_param' objects to describe the function arguments. The BTF_KIND_FUNC must have a valid name and it must refer back to a BTF_KIND_FUNC_PROTO. The above is the conclusion after the discussion between Edward Cree, Alexei, Daniel, Yonghong and Martin. By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO, a complete function signature can be obtained. It will be used in the later patches to learn the function signature of a running bpf program. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20bpf: btf: Break up btf_type_is_void()Martin KaFai Lau
This patch breaks up btf_type_is_void() into btf_type_is_void() and btf_type_is_fwd(). It also adds btf_type_nosize() to better describe it is testing a type has nosize info. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20bpf: allow zero-initializing hash map seedLorenz Bauer
Add a new flag BPF_F_ZERO_SEED, which forces a hash map to initialize the seed to zero. This is useful when doing performance analysis both on individual BPF programs, as well as the kernel's hash table implementation. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-16bpf: allocate local storage buffers using GFP_ATOMICRoman Gushchin
Naresh reported an issue with the non-atomic memory allocation of cgroup local storage buffers: [ 73.047526] BUG: sleeping function called from invalid context at /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421 [ 73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto [ 73.068342] INFO: lockdep is turned off. [ 73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted 4.20.0-rc2-next-20181113 #1 [ 73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 73.088018] Call Trace: [ 73.090463] dump_stack+0x70/0xa5 [ 73.093783] ___might_sleep+0x152/0x240 [ 73.097619] __might_sleep+0x4a/0x80 [ 73.101191] __kmalloc_node+0x1cf/0x2f0 [ 73.105031] ? cgroup_storage_update_elem+0x46/0x90 [ 73.109909] cgroup_storage_update_elem+0x46/0x90 cgroup_storage_update_elem() (as well as other update map update callbacks) is called with disabled preemption, so GFP_ATOMIC allocation should be used: e.g. alloc_htab_elem() in hashtab.c. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16bpf: fix off-by-one error in adjust_subprog_startsEdward Cree
When patching in a new sequence for the first insn of a subprog, the start of that subprog does not change (it's the first insn of the sequence), so adjust_subprog_starts should check start <= off (rather than < off). Also added a test to test_verifier.c (it's essentially the syz reproducer). Fixes: cc8b0b92a169 ("bpf: introduce function calls (function boundaries)") Reported-by: syzbot+4fc427c7af994b0948be@syzkaller.appspotmail.com Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16bpf: fix null pointer dereference on pointer offloadColin Ian King
Pointer offload is being null checked however the following statement dereferences the potentially null pointer offload when assigning offload->dev_state. Fix this by only assigning it if offload is not null. Detected by CoverityScan, CID#1475437 ("Dereference after null check") Fixes: 00db12c3d141 ("bpf: call verifier_prep from its callback in struct bpf_offload_dev") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: Allow narrow loads with offset > 0Andrey Ignatov
Currently BPF verifier allows narrow loads for a context field only with offset zero. E.g. if there is a __u32 field then only the following loads are permitted: * off=0, size=1 (narrow); * off=0, size=2 (narrow); * off=0, size=4 (full). On the other hand LLVM can generate a load with offset different than zero that make sense from program logic point of view, but verifier doesn't accept it. E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code: #define DST_IP4 0xC0A801FEU /* 192.168.1.254 */ ... if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) && where ctx is struct bpf_sock_addr. Some versions of LLVM can produce the following byte code for it: 8: 71 12 07 00 00 00 00 00 r2 = *(u8 *)(r1 + 7) 9: 67 02 00 00 18 00 00 00 r2 <<= 24 10: 18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 4261412864 ll 12: 5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 <LBB0_6> where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1 and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently rejected by verifier. Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok() what means any is_valid_access implementation, that uses the function, works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or sock_addr_is_valid_access() for bpf_sock_addr. The patch makes such loads supported. Offset can be in [0; size_default) but has to be multiple of load size. E.g. for __u32 field the following loads are supported now: * off=0, size=1 (narrow); * off=1, size=1 (narrow); * off=2, size=1 (narrow); * off=3, size=1 (narrow); * off=0, size=2 (narrow); * off=2, size=2 (narrow); * off=0, size=4 (full). Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: do not pass netdev to translate() and prepare() offload callbacksQuentin Monnet
The kernel functions to prepare verifier and translate for offloaded program retrieve "offload" from "prog", and "netdev" from "offload". Then both "prog" and "netdev" are passed to the callbacks. Simplify this by letting the drivers retrieve the net device themselves from the offload object attached to prog - if they need it at all. There is currently no need to pass the netdev as an argument to those functions. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: pass prog instead of env to bpf_prog_offload_verifier_prep()Quentin Monnet
Function bpf_prog_offload_verifier_prep(), called from the kernel BPF verifier to run a driver-specific callback for preparing for the verification step for offloaded programs, takes a pointer to a struct bpf_verifier_env object. However, no driver callback needs the whole structure at this time: the two drivers supporting this, nfp and netdevsim, only need a pointer to the struct bpf_prog instance held by env. Update the callback accordingly, on kernel side and in these two drivers. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: pass destroy() as a callback and remove its ndo_bpf subcommandQuentin Monnet
As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to program destruction to the struct and remove the subcommand that was used to call them through the NDO. Remove function __bpf_offload_ndo(), which is no longer used. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: pass translate() as a callback and remove its ndo_bpf subcommandQuentin Monnet
As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to code translation to the struct and remove the subcommand that was used to call them through the NDO. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: call verifier_prep from its callback in struct bpf_offload_devQuentin Monnet
In a way similar to the change previously brought to the verify_insn hook and to the finalize callback, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to prepare driver verifiers. Since the dev_ops pointer in struct bpf_prog_offload is no longer used by any callback, we can now remove it from struct bpf_prog_offload. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: call finalize() from its callback in struct bpf_offload_devQuentin Monnet
In a way similar to the change previously brought to the verify_insn hook, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to perform final verification steps for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: call verify_insn from its callback in struct bpf_offload_devQuentin Monnet
We intend to remove the dev_ops in struct bpf_prog_offload, and to only keep the ops in struct bpf_offload_dev instead, which is accessible from more locations for passing function pointers. But dev_ops is used for calling the verify_insn hook. Switch to the newly added ops in struct bpf_prog_offload instead. To avoid table lookups for each eBPF instruction to verify, we remember the offdev attached to a netdev and modify bpf_offload_find_netdev() to avoid performing more than once a lookup for a given offload object. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10bpf: pass a struct with offload callbacks to bpf_offload_dev_create()Quentin Monnet
For passing device functions for offloaded eBPF programs, there used to be no place where to store the pointer without making the non-offloaded programs pay a memory price. As a consequence, three functions were called with ndo_bpf() through specific commands. Now that we have struct bpf_offload_dev, and since none of those operations rely on RTNL, we can turn these three commands into hooks inside the struct bpf_prog_offload_ops, and pass them as part of bpf_offload_dev_create(). This commit effectively passes a pointer to the struct to bpf_offload_dev_create(). We temporarily have two struct bpf_prog_offload_ops instances, one under offdev->ops and one under offload->dev_ops. The next patches will make the transition towards the former, so that offload->dev_ops can be removed, and callbacks relying on ndo_bpf() added to offdev->ops as well. While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and similarly for netdevsim). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-09bpf: let verifier to calculate and record max_pkt_offsetJiong Wang
In check_packet_access, update max_pkt_offset after the offset has passed __check_packet_access. It should be safe to use u32 for max_pkt_offset as explained in code comment. Also, when there is tail call, the max_pkt_offset of the called program is unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such case. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-02bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unprivDaniel Borkmann
While dbecd7388476 ("bpf: get kernel symbol addresses via syscall") zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall") forgot about doing so and therefore returns the #elems of the user set up buffer which is incorrect. It also needs to indicate a info.nr_jited_func_lens of zero. Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-02bpf: show main program address and length in bpf_prog_infoSong Liu
Currently, when there is no subprog (prog->aux->func_cnt == 0), bpf_prog_info does not return any jited_ksyms or jited_func_lens. This patch adds main program address (prog->bpf_func) and main program length (prog->jited_len) to bpf_prog_info. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-02bpf: show real jited address in bpf_prog_info->jited_ksymsSong Liu
Currently, jited_ksyms in bpf_prog_info shows page addresses of jited bpf program. The main reason here is to not expose randomized start address. However, this is not ideal for detailed profiling (find hot instructions from stack traces). This patch replaces the page address with real prog start address. This change is OK because bpf_prog_get_info_by_fd() is only available to root. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-02bpf: show real jited prog address in /proc/kallsymsSong Liu
Currently, /proc/kallsyms shows page address of jited bpf program. The main reason here is to not expose randomized start address. However, This is not ideal for detailed profiling (find hot instructions from stack traces). This patch replaces the page address with real prog start address. This change is OK because these addresses are still protected by sysctl kptr_restrict (see kallsyms_show_value()), and only programs loaded by root are added to kallsyms (see bpf_prog_kallsyms_add()). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-31bpf: don't set id on after map lookup with ptr_to_map_val returnDaniel Borkmann
In the verifier there is no such semantics where registers with PTR_TO_MAP_VALUE type have an id assigned to them. This is only used in PTR_TO_MAP_VALUE_OR_NULL and later on nullified once the test against NULL has been pattern matched and type transformed into PTR_TO_MAP_VALUE. Fixes: 3e6a4b3e0289 ("bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31bpf: fix partial copy of map_ptr when dst is scalarDaniel Borkmann
ALU operations on pointers such as scalar_reg += map_value_ptr are handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr and range in the register state share a union, so transferring state through dst_reg->range = ptr_reg->range is just buggy as any new map_ptr in the dst_reg is then truncated (or null) for subsequent checks. Fix this by adding a raw member and use it for copying state over to dst_reg. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: add bpf_jit_limit knob to restrict unpriv allocationsDaniel Borkmann
Rick reported that the BPF JIT could potentially fill the entire module space with BPF programs from unprivileged users which would prevent later attempts to load normal kernel modules or privileged BPF programs, for example. If JIT was enabled but unsuccessful to generate the image, then before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") we would always fall back to the BPF interpreter. Nowadays in the case where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort with a failure since the BPF interpreter was compiled out. Add a global limit and enforce it for unprivileged users such that in case of BPF interpreter compiled out we fail once the limit has been reached or we fall back to BPF interpreter earlier w/o using module mem if latter was compiled in. In a next step, fair share among unprivileged users can be resolved in particular for the case where we would fail hard once limit is reached. Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: make direct packet write unclone more robustDaniel Borkmann
Given this seems to be quite fragile and can easily slip through the cracks, lets make direct packet write more robust by requiring that future program types which allow for such write must provide a prologue callback. In case of XDP and sk_msg it's noop, thus add a generic noop handler there. The latter starts out with NULL data/data_end unconditionally when sg pages are shared. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: fix leaking uninitialized memory on pop/peek helpersDaniel Borkmann
Commit f1a2e44a3aec ("bpf: add queue and stack maps") added helpers with ARG_PTR_TO_UNINIT_MAP_VALUE. Meaning, the helper is supposed to fill the map value buffer with data instead of reading from it like in other helpers such as map update. However, given the buffer is allowed to be uninitialized (since we fill it in the helper anyway), it also means that the helper is obliged to wipe the memory in case of an error in order to not allow for leaking uninitialized memory. Given pop/peek is both handled inside __{stack,queue}_map_get(), lets wipe it there on error case, that is, empty stack/queue. Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Mauricio Vasquez B<mauricio.vasquez@polito.it> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: fix direct packet write into pop/peek helpersDaniel Borkmann
Commit f1a2e44a3aec ("bpf: add queue and stack maps") probably just copy-pasted .pkt_access for bpf_map_{pop,peek}_elem() helpers, but this is buggy in this context since it would allow writes into cloned skbs which is invalid. Therefore, disable .pkt_access for the two. Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Mauricio Vasquez B<mauricio.vasquez@polito.it> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: fix cg_skb types to hint access type in may_access_direct_pkt_dataDaniel Borkmann
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") added direct packet access for skbs in cg_skb program types, however allowed access type was not added to the may_access_direct_pkt_data() helper. Therefore the latter always returns false. This is not directly an issue, it just means writes are unconditionally disabled (which is correct) but also reads. Latter is relevant in this function when BPF helpers may read direct packet data which is unconditionally disabled then. Fix it by properly adding BPF_PROG_TYPE_CGROUP_SKB to may_access_direct_pkt_data(). Fixes: b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: fix direct packet access for flow dissector progsDaniel Borkmann
Commit d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") added direct packet access for skbs in may_access_direct_pkt_data() function where this enables read and write access to the skb->data. This is buggy because without a prologue generator such as bpf_unclone_prologue() we would allow for writing into cloned skbs. Original intention might have been to only allow read access where this is not needed (similar as the flow_dissector_func_proto() indicates which enables only bpf_skb_load_bytes() as well), therefore this patch fixes it to restrict to read-only. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-26bpf, btf: fix a missing check bug in btf_parseMartin Lau
Wenwen Wang reported: In btf_parse(), the header of the user-space btf data 'btf_data' is firstly parsed and verified through btf_parse_hdr(). In btf_parse_hdr(), the header is copied from user-space 'btf_data' to kernel-space 'btf->hdr' and then verified. If no error happens during the verification process, the whole data of 'btf_data', including the header, is then copied to 'data' in btf_parse(). It is obvious that the header is copied twice here. More importantly, no check is enforced after the second copy to make sure the headers obtained in these two copies are same. Given that 'btf_data' resides in the user space, a malicious user can race to modify the header between these two copies. By doing so, the user can inject inconsistent data, which can cause undefined behavior of the kernel and introduce potential security risk. This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf: btf: Fix a missing check bug"). To fix it, this patch copies the user 'btf_data' *before* parsing / verifying the BTF header. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Co-developed-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-26bpf: devmap: fix wrong interface selection in notifier_callTaehee Yoo
The dev_map_notification() removes interface in devmap if unregistering interface's ifindex is same. But only checking ifindex is not enough because other netns can have same ifindex. so that wrong interface selection could occurred. Hence netdev pointer comparison code is added. v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann) v1: Initial patch Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Implement two new kind of BPF maps, that is, queue and stack map along with new peek, push and pop operations, from Mauricio. 2) Add support for MSG_PEEK flag when redirecting into an ingress psock sk_msg queue, and add a new helper bpf_msg_push_data() for insert data into the message, from John. 3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use direct packet access for __skb_buff, from Song. 4) Use more lightweight barriers for walking perf ring buffer for libbpf and perf tool as well. Also, various fixes and improvements from verifier side, from Daniel. 5) Add per-symbol visibility for DSO in libbpf and hide by default global symbols such as netlink related functions, from Andrey. 6) Two improvements to nfp's BPF offload to check vNIC capabilities in case prog is shared with multiple vNICs and to protect against mis-initializing atomic counters, from Jakub. 7) Fix for bpftool to use 4 context mode for the nfp disassembler, also from Jakub. 8) Fix a return value comparison in test_libbpf.sh and add several bpftool improvements in bash completion, documentation of bpf fs restrictions and batch mode summary print, from Quentin. 9) Fix a file resource leak in BPF selftest's load_kallsyms() helper, from Peng. 10) Fix an unused variable warning in map_lookup_and_delete_elem(), from Alexei. 11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc, from Nicolas. 12) Add missing executables to .gitignore in BPF selftests, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20bpf, verifier: avoid retpoline for map push/pop/peek operationDaniel Borkmann
Extend prior work from 09772d92cd5a ("bpf: avoid retpoline for lookup/update/delete calls on maps") to also apply to the recently added map helpers that perform push/pop/peek operations so that the indirect call can be avoided. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>