summaryrefslogtreecommitdiff
path: root/kernel/bpf/verifier.c
AgeCommit message (Collapse)Author
2017-04-01bpf, verifier: fix rejection of unaligned access checks for map_value_adjDaniel Borkmann
Currently, the verifier doesn't reject unaligned access for map_value_adj register types. Commit 484611357c19 ("bpf: allow access into map value arrays") added logic to check_ptr_alignment() extending it from PTR_TO_PACKET to also PTR_TO_MAP_VALUE_ADJ, but for PTR_TO_MAP_VALUE_ADJ no enforcement is in place, because reg->id for PTR_TO_MAP_VALUE_ADJ reg types is never non-zero, meaning, we can cause BPF_H/_W/_DW-based unaligned access for architectures not supporting efficient unaligned access, and thus worst case could raise exceptions on some archs that are unable to correct the unaligned access or perform a different memory access to the actual requested one and such. i) Unaligned load with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0x42533a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+11 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (61) r1 = *(u32 *)(r0 +0) 8: (35) if r1 >= 0xb goto pc+9 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=10 R10=fp 9: (07) r0 += 3 10: (79) r7 = *(u64 *)(r0 +0) R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R10=fp 11: (79) r7 = *(u64 *)(r0 +2) R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R7=inv R10=fp [...] ii) Unaligned store with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0x4df16a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+19 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (07) r0 += 3 8: (7a) *(u64 *)(r0 +0) = 42 R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp 9: (7a) *(u64 *)(r0 +2) = 43 R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp 10: (7a) *(u64 *)(r0 -2) = 44 R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp [...] For the PTR_TO_PACKET type, reg->id is initially zero when skb->data was fetched, it later receives a reg->id from env->id_gen generator once another register with UNKNOWN_VALUE type was added to it via check_packet_ptr_add(). The purpose of this reg->id is twofold: i) it is used in find_good_pkt_pointers() for setting the allowed access range for regs with PTR_TO_PACKET of same id once verifier matched on data/data_end tests, and ii) for check_ptr_alignment() to determine that when not having efficient unaligned access and register with UNKNOWN_VALUE was added to PTR_TO_PACKET, that we're only allowed to access the content bytewise due to unknown unalignment. reg->id was never intended for PTR_TO_MAP_VALUE{,_ADJ} types and thus is always zero, the only marking is in PTR_TO_MAP_VALUE_OR_NULL that was added after 484611357c19 via 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers"). Above tests will fail for non-root environment due to prohibited pointer arithmetic. The fix splits register-type specific checks into their own helper instead of keeping them combined, so we don't run into a similar issue in future once we extend check_ptr_alignment() further and forget to add reg->type checks for some of the checks. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01bpf, verifier: fix alu ops against map_value{, _adj} register typesDaniel Borkmann
While looking into map_value_adj, I noticed that alu operations directly on the map_value() resp. map_value_adj() register (any alu operation on a map_value() register will turn it into a map_value_adj() typed register) are not sufficiently protected against some of the operations. Two non-exhaustive examples are provided that the verifier needs to reject: i) BPF_AND on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0xbf842a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+2 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (57) r0 &= 8 8: (7a) *(u64 *)(r0 +0) = 22 R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp 9: (95) exit from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp 9: (95) exit processed 10 insns ii) BPF_ADD in 32 bit mode on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0xc24eee00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+2 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (04) (u32) r0 += (u32) 0 8: (7a) *(u64 *)(r0 +0) = 22 R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 9: (95) exit from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp 9: (95) exit processed 10 insns Issue is, while min_value / max_value boundaries for the access are adjusted appropriately, we change the pointer value in a way that cannot be sufficiently tracked anymore from its origin. Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact, all the test cases coming with 484611357c19 ("bpf: allow access into map value arrays") perform BPF_ADD only on the destination register that is PTR_TO_MAP_VALUE_ADJ. Only for UNKNOWN_VALUE register types such operations make sense, f.e. with unknown memory content fetched initially from a constant offset from the map value memory into a register. That register is then later tested against lower / upper bounds, so that the verifier can then do the tracking of min_value / max_value, and properly check once that UNKNOWN_VALUE register is added to the destination register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the original use-case is solving. Note, tracking on what is being added is done through adjust_reg_min_max_vals() and later access to the map value enforced with these boundaries and the given offset from the insn through check_map_access_adj(). Tests will fail for non-root environment due to prohibited pointer arithmetic, in particular in check_alu_op(), we bail out on the is_pointer_value() check on the dst_reg (which is false in root case as we allow for pointer arithmetic via env->allow_ptr_leaks). Similarly to PTR_TO_PACKET, one way to fix it is to restrict the allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit mode BPF_ADD. The test_verifier suite runs fine after the patch and it also rejects mentioned test cases. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24bpf: improve verifier packet range checksAlexei Starovoitov
llvm can optimize the 'if (ptr > data_end)' checks to be in the order slightly different than the original C code which will confuse verifier. Like: if (ptr + 16 > data_end) return TC_ACT_SHOT; // may be followed by if (ptr + 14 > data_end) return TC_ACT_SHOT; while llvm can see that 'ptr' is valid for all 16 bytes, the verifier could not. Fix verifier logic to account for such case and add a test. Reported-by: Huapeng Zhou <hzhou@fb.com> Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22bpf: Add hash of maps supportMartin KaFai Lau
This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22bpf: Add array of maps supportMartin KaFai Lau
This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22bpf: Fix and simplifications on inline map lookupMartin KaFai Lau
Fix in verifier: For the same bpf_map_lookup_elem() instruction (i.e. "call 1"), a broken case is "a different type of map could be used for the same lookup instruction". For example, an array in one case and a hashmap in another. We have to resort to the old dynamic call behavior in this case. The fix is to check for collision on insn_aux->map_ptr. If there is collision, don't inline the map lookup. Please see the "do_reg_lookup()" in test_map_in_map_kern.c in the later patch for how-to trigger the above case. Simplifications on array_map_gen_lookup(): 1. Calculate elem_size from map->value_size. It removes the need for 'struct bpf_array' which makes the later map-in-map implementation easier. 2. Remove the 'elem_size == 1' test Fixes: 81ed18ab3098 ("bpf: add helper inlining infra and optimize map_array lookup") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bpf: add helper inlining infra and optimize map_array lookupAlexei Starovoitov
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bpf: adjust insn_aux_data when patching insnsAlexei Starovoitov
convert_ctx_accesses() replaces single bpf instruction with a set of instructions. Adjust corresponding insn_aux_data while patching. It's needed to make sure subsequent 'for(all insn)' loops have matching insn and insn_aux_data. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bpf: refactor fixup_bpf_calls()Alexei Starovoitov
reduce indent and make it iterate over instructions similar to convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bpf: move fixup_bpf_calls() functionAlexei Starovoitov
no functional change. move fixup_bpf_calls() to verifier.c it's being refactored in the next patch Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01bpf: update the comment about the length of analysisGary Lin
Commit 07016151a446 ("bpf, verifier: further improve search pruning") increased the limit of processed instructions from 32k to 64k, but the comment still mentioned the 32k limit. This commit updates the comment to reflect the change. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gary Lin <glin@suse.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-23bpf: fix spelling mistake: "proccessed" -> "processed"Colin Ian King
trivial fix to spelling mistake in verbose log message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14bpf: reduce compiler warnings by adding fallthrough commentsAlexander Alemayhu
Fixes the following warnings: kernel/bpf/verifier.c: In function ‘may_access_direct_pkt_data’: kernel/bpf/verifier.c:702:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (t == BPF_WRITE) ^ kernel/bpf/verifier.c:704:2: note: here case BPF_PROG_TYPE_SCHED_CLS: ^~~~ kernel/bpf/verifier.c: In function ‘reg_set_min_max_inv’: kernel/bpf/verifier.c:2057:23: warning: this statement may fall through [-Wimplicit-fallthrough=] true_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2058:2: note: here case BPF_JSGT: ^~~~ kernel/bpf/verifier.c:2068:23: warning: this statement may fall through [-Wimplicit-fallthrough=] true_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2069:2: note: here case BPF_JSGE: ^~~~ kernel/bpf/verifier.c: In function ‘reg_set_min_max’: kernel/bpf/verifier.c:2009:24: warning: this statement may fall through [-Wimplicit-fallthrough=] false_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2010:2: note: here case BPF_JSGT: ^~~~ kernel/bpf/verifier.c:2019:24: warning: this statement may fall through [-Wimplicit-fallthrough=] false_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2020:2: note: here case BPF_JSGE: ^~~~ Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bpf: enable verifier to add 0 to packet ptrWilliam Tu
The patch fixes the case when adding a zero value to the packet pointer. The zero value could come from src_reg equals type BPF_K or CONST_IMM. The patch fixes both, otherwise the verifer reports the following error: [...] R0=imm0,min_value=0,max_value=0 R1=pkt(id=0,off=0,r=4) R2=pkt_end R3=fp-12 R4=imm4,min_value=4,max_value=4 R5=pkt(id=0,off=4,r=4) 269: (bf) r2 = r0 // r2 becomes imm0 270: (77) r2 >>= 3 271: (bf) r4 = r1 // r4 becomes pkt ptr 272: (0f) r4 += r2 // r4 += 0 addition of negative constant to packet pointer is not allowed Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Mihai Budiu <mbudiu@vmware.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24bpf: enable verifier to better track const alu opsDaniel Borkmann
William reported couple of issues in relation to direct packet access. Typical scheme is to check for data + [off] <= data_end, where [off] can be either immediate or coming from a tracked register that contains an immediate, depending on the branch, we can then access the data. However, in case of calculating [off] for either the mentioned test itself or for access after the test in a more "complex" way, then the verifier will stop tracking the CONST_IMM marked register and will mark it as UNKNOWN_VALUE one. Adding that UNKNOWN_VALUE typed register to a pkt() marked register, the verifier then bails out in check_packet_ptr_add() as it finds the registers imm value below 48. In the first below example, that is due to evaluate_reg_imm_alu() not handling right shifts and thus marking the register as UNKNOWN_VALUE via helper __mark_reg_unknown_value() that resets imm to 0. In the second case the same happens at the time when r4 is set to r4 &= r5, where it transitions to UNKNOWN_VALUE from evaluate_reg_imm_alu(). Later on r4 we shift right by 3 inside evaluate_reg_alu(), where the register's imm turns into 3. That is, for registers with type UNKNOWN_VALUE, imm of 0 means that we don't know what value the register has, and for imm > 0 it means that the value has [imm] upper zero bits. F.e. when shifting an UNKNOWN_VALUE register by 3 to the right, no matter what value it had, we know that the 3 upper most bits must be zero now. This is to make sure that ALU operations with unknown registers don't overflow. Meaning, once we know that we have more than 48 upper zero bits, or, in other words cannot go beyond 0xffff offset with ALU ops, such an addition will track the target register as a new pkt() register with a new id, but 0 offset and 0 range, so for that a new data/data_end test will be required. Is the source register a CONST_IMM one that is to be added to the pkt() register, or the source instruction is an add instruction with immediate value, then it will get added if it stays within max 0xffff bounds. >From there, pkt() type, can be accessed should reg->off + imm be within the access range of pkt(). [...] from 28 to 30: R0=imm1,min_value=1,max_value=1 R1=pkt(id=0,off=0,r=22) R2=pkt_end R3=imm144,min_value=144,max_value=144 R4=imm0,min_value=0,max_value=0 R5=inv48,min_value=2054,max_value=2054 R10=fp 30: (bf) r5 = r3 31: (07) r5 += 23 32: (77) r5 >>= 3 33: (bf) r6 = r1 34: (0f) r6 += r5 cannot add integer value with 0 upper zero bits to ptr_to_packet [...] from 52 to 80: R0=imm1,min_value=1,max_value=1 R1=pkt(id=0,off=0,r=34) R2=pkt_end R3=inv R4=imm272 R5=inv56,min_value=17,max_value=17 R6=pkt(id=0,off=26,r=34) R10=fp 80: (07) r4 += 71 81: (18) r5 = 0xfffffff8 83: (5f) r4 &= r5 84: (77) r4 >>= 3 85: (0f) r1 += r4 cannot add integer value with 3 upper zero bits to ptr_to_packet Thus to get above use-cases working, evaluate_reg_imm_alu() has been extended for further ALU ops. This is fine, because we only operate strictly within realm of CONST_IMM types, so here we don't care about overflows as they will happen in the simulated but also real execution and interaction with pkt() in check_packet_ptr_add() will check actual imm value once added to pkt(), but it's irrelevant before. With regards to 06c1c049721a ("bpf: allow helpers access to variable memory") that works on UNKNOWN_VALUE registers, the verifier becomes now a bit smarter as it can better resolve ALU ops, so we need to adapt two test cases there, as min/max bound tracking only becomes necessary when registers were spilled to stack. So while mask was set before to track upper bound for UNKNOWN_VALUE case, it's now resolved directly as CONST_IMM, and such contructs are only necessary when f.e. registers are spilled. For commit 6b17387307ba ("bpf: recognize 64bit immediate loads as consts") that initially enabled dw load tracking only for nfp jit/ analyzer, I did couple of tests on large, complex programs and we don't increase complexity badly (my tests were in ~3% range on avg). I've added a couple of tests similar to affected code above, and it works fine with verifier now. Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Gianluca Borello <g.borello@gmail.com> Cc: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-01-16bpf: rework prog_digest into prog_tagDaniel Borkmann
Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") was recently discussed, partially due to admittedly suboptimal name of "prog_digest" in combination with sha1 hash usage, thus inevitably and rightfully concerns about its security in terms of collision resistance were raised with regards to use-cases. The intended use cases are for debugging resp. introspection only for providing a stable "tag" over the instruction sequence that both kernel and user space can calculate independently. It's not usable at all for making a security relevant decision. So collisions where two different instruction sequences generate the same tag can happen, but ideally at a rather low rate. The "tag" will be dumped in hex and is short enough to introspect in tracepoints or kallsyms output along with other data such as stack trace, etc. Thus, this patch performs a rename into prog_tag and truncates the tag to a short output (64 bits) to make it obvious it's not collision-free. Should in future a hash or facility be needed with a security relevant focus, then we can think about requirements, constraints, etc that would fit to that situation. For now, rework the exposed parts for the current use cases as long as nothing has been released yet. Tested on x86_64 and s390x. Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12bpf: allow b/h/w/dw access for bpf's cb in ctxDaniel Borkmann
When structs are used to store temporary state in cb[] buffer that is used with programs and among tail calls, then the generated code will not always access the buffer in bpf_w chunks. We can ease programming of it and let this act more natural by allowing for aligned b/h/w/dw sized access for cb[] ctx member. Various test cases are attached as well for the selftest suite. Potentially, this can also be reused for other program types to pass data around. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12bpf: pass original insn directly to convert_ctx_accessDaniel Borkmann
Currently, when calling convert_ctx_access() callback for the various program types, we pass in insn->dst_reg, insn->src_reg, insn->off from the original instruction. This information is needed to rewrite the instruction that is based on the user ctx structure into a kernel representation for the ctx. As we'd like to allow access size beyond just BPF_W, we'd need also insn->code for that in order to decode the original access size. Given that, lets just pass insn directly to the convert_ctx_access() callback and work on that to not clutter the callback with even more arguments we need to pass when everything is already contained in insn. So lets go through that once, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09bpf: rename ARG_PTR_TO_STACKAlexei Starovoitov
since ARG_PTR_TO_STACK is no longer just pointer to stack rename it to ARG_PTR_TO_MEM and adjust comment. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09bpf: allow helpers access to variable memoryGianluca Borello
Currently, helpers that read and write from/to the stack can do so using a pair of arguments of type ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE. ARG_CONST_STACK_SIZE accepts a constant register of type CONST_IMM, so that the verifier can safely check the memory access. However, requiring the argument to be a constant can be limiting in some circumstances. Since the current logic keeps track of the minimum and maximum value of a register throughout the simulated execution, ARG_CONST_STACK_SIZE can be changed to also accept an UNKNOWN_VALUE register in case its boundaries have been set and the range doesn't cause invalid memory accesses. One common situation when this is useful: int len; char buf[BUFSIZE]; /* BUFSIZE is 128 */ if (some_condition) len = 42; else len = 84; some_helper(..., buf, len & (BUFSIZE - 1)); The compiler can often decide to assign the constant values 42 or 48 into a variable on the stack, instead of keeping it in a register. When the variable is then read back from stack into the register in order to be passed to the helper, the verifier will not be able to recognize the register as constant (the verifier is not currently tracking all constant writes into memory), and the program won't be valid. However, by allowing the helper to accept an UNKNOWN_VALUE register, this program will work because the bitwise AND operation will set the range of possible values for the UNKNOWN_VALUE register to [0, BUFSIZE), so the verifier can guarantee the helper call will be safe (assuming the argument is of type ARG_CONST_STACK_SIZE_OR_ZERO, otherwise one more check against 0 would be needed). Custom ranges can be set not only with ALU operations, but also by explicitly comparing the UNKNOWN_VALUE register with constants. Another very common example happens when intercepting system call arguments and accessing user-provided data of variable size using bpf_probe_read(). One can load at runtime the user-provided length in an UNKNOWN_VALUE register, and then read that exact amount of data up to a compile-time determined limit in order to fit into the proper local storage allocated on the stack, without having to guess a suboptimal access size at compile time. Also, in case the helpers accepting the UNKNOWN_VALUE register operate in raw mode, disable the raw mode so that the program is required to initialize all memory, since there is no guarantee the helper will fill it completely, leaving possibilities for data leak (just relevant when the memory used by the helper is the stack, not when using a pointer to map element value or packet). In other words, ARG_PTR_TO_RAW_STACK will be treated as ARG_PTR_TO_STACK. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09bpf: allow adjusted map element values to spillGianluca Borello
commit 484611357c19 ("bpf: allow access into map value arrays") introduces the ability to do pointer math inside a map element value via the PTR_TO_MAP_VALUE_ADJ register type. The current support doesn't handle the case where a PTR_TO_MAP_VALUE_ADJ is spilled into the stack, limiting several use cases, especially when generating bpf code from a compiler. Handle this case by explicitly enabling the register type PTR_TO_MAP_VALUE_ADJ to be spilled. Also, make sure that min_value and max_value are reset just for BPF_LDX operations that don't result in a restore of a spilled register from stack. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09bpf: allow helpers access to map element valuesGianluca Borello
Enable helpers to directly access a map element value by passing a register type PTR_TO_MAP_VALUE (or PTR_TO_MAP_VALUE_ADJ) to helper arguments ARG_PTR_TO_STACK or ARG_PTR_TO_RAW_STACK. This enables several use cases. For example, a typical tracing program might want to capture pathnames passed to sys_open() with: struct trace_data { char pathname[PATHLEN]; }; SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { struct trace_data data; bpf_probe_read(data.pathname, sizeof(data.pathname), ctx->di); /* consume data.pathname, for example via * bpf_trace_printk() or bpf_perf_event_output() */ } Such a program could easily hit the stack limit in case PATHLEN needs to be large or more local variables need to exist, both of which are quite common scenarios. Allowing direct helper access to map element values, one could do: struct bpf_map_def SEC("maps") scratch_map = { .type = BPF_MAP_TYPE_PERCPU_ARRAY, .key_size = sizeof(u32), .value_size = sizeof(struct trace_data), .max_entries = 1, }; SEC("kprobe/sys_open") int bpf_sys_open(struct pt_regs *ctx) { int id = 0; struct trace_data *p = bpf_map_lookup_elem(&scratch_map, &id); if (!p) return; bpf_probe_read(p->pathname, sizeof(p->pathname), ctx->di); /* consume p->pathname, for example via * bpf_trace_printk() or bpf_perf_event_output() */ } And wouldn't risk exhausting the stack. Code changes are loosely modeled after commit 6841de8b0d03 ("bpf: allow helpers access the packet directly"). Unlike with PTR_TO_PACKET, these changes just work with ARG_PTR_TO_STACK and ARG_PTR_TO_RAW_STACK (not ARG_PTR_TO_MAP_KEY, ARG_PTR_TO_MAP_VALUE, ...): adding those would be trivial, but since there is not currently a use case for that, it's reasonable to limit the set of changes. Also, add new tests to make sure accesses to map element values from helpers never go out of boundary, even when adjusted. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09bpf: split check_mem_access logic for map valuesGianluca Borello
Move the logic to check memory accesses to a PTR_TO_MAP_VALUE_ADJ from check_mem_access() to a separate helper check_map_access_adj(). This enables to use those checks in other parts of the verifier as well, where boundaries on PTR_TO_MAP_VALUE_ADJ might need to be checked, for example when checking helper function arguments. The same thing is already happening for other types such as PTR_TO_PACKET and its check_packet_access() helper. The code has been copied verbatim, with the only difference of removing the "off += reg->max_value" statement and moving the sum into the call statement to check_map_access(), as that was only needed due to the earlier common check_map_access() call. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17bpf: fix mark_reg_unknown_value for spilled regs on map value markingDaniel Borkmann
Martin reported a verifier issue that hit the BUG_ON() for his test case in the mark_reg_unknown_value() function: [ 202.861380] kernel BUG at kernel/bpf/verifier.c:467! [...] [ 203.291109] Call Trace: [ 203.296501] [<ffffffff811364d5>] mark_map_reg+0x45/0x50 [ 203.308225] [<ffffffff81136558>] mark_map_regs+0x78/0x90 [ 203.320140] [<ffffffff8113938d>] do_check+0x226d/0x2c90 [ 203.331865] [<ffffffff8113a6ab>] bpf_check+0x48b/0x780 [ 203.343403] [<ffffffff81134c8e>] bpf_prog_load+0x27e/0x440 [ 203.355705] [<ffffffff8118a38f>] ? handle_mm_fault+0x11af/0x1230 [ 203.369158] [<ffffffff812d8188>] ? security_capable+0x48/0x60 [ 203.382035] [<ffffffff811351a4>] SyS_bpf+0x124/0x960 [ 203.393185] [<ffffffff810515f6>] ? __do_page_fault+0x276/0x490 [ 203.406258] [<ffffffff816db320>] entry_SYSCALL_64_fastpath+0x13/0x94 This issue got uncovered after the fix in a08dd0da5307 ("bpf: fix regression on verifier pruning wrt map lookups"). The reason why it wasn't noticed before was, because as mentioned in a08dd0da5307, mark_map_regs() was doing the id matching incorrectly based on the uncached regs[regno].id. So, in the first loop, we walked all regs and as soon as we found regno == i, then this reg's id was cleared when calling mark_reg_unknown_value() thus that every subsequent register was probed against id of 0 (which, in combination with the PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other register state can hold), and therefore wasn't type transitioned such as in the spilled register case for the second loop. Now since that got fixed, it turned out that 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used mark_reg_unknown_value() incorrectly for the spilled regs, and thus hitting the BUG_ON() in some cases due to regno >= MAX_BPF_REG. Although spilled regs have the same type as the non-spilled regs for the verifier state, that is, struct bpf_reg_state, they are semantically different from the non-spilled regs. In other words, there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs in the stack, for example, register R<x> could have been spilled by the program to stack location X, Y, Z, and in mark_map_regs() we need to scan these stack slots of type STACK_SPILL for potential registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL. Therefore, depending on the location, the spilled_regs regno can be a lot higher than just MAX_BPF_REG's value since we operate on stack instead. The reset in mark_reg_unknown_value() itself is just fine, only that the BUG_ON() was inappropriate for this. Fix it by making a __mark_reg_unknown_value() version that can be called from mark_map_reg() generically; we know for the non-spilled case that the regno is always < MAX_BPF_REG anyway. Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17bpf: dynamically allocate digest scratch bufferDaniel Borkmann
Geert rightfully complained that 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") added a too large allocation of variable 'raw' from bss section, and should instead be done dynamically: # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2 add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291) function old new delta raw - 32832 +32832 [...] Since this is only relevant during program creation path, which can be considered slow-path anyway, lets allocate that dynamically and be not implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at the beginning of replace_map_fd_with_map_ptr() and also error handling stays straight forward. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17bpf: fix regression on verifier pruning wrt map lookupsDaniel Borkmann
Commit 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") introduced a regression where existing programs stopped loading due to reaching the verifier's maximum complexity limit, whereas prior to this commit they were loading just fine; the affected program has roughly 2k instructions. What was found is that state pruning couldn't be performed effectively anymore due to mismatches of the verifier's register state, in particular in the id tracking. It doesn't mean that 57a09bf0a416 is incorrect per se, but rather that verifier needs to perform a lot more work for the same program with regards to involved map lookups. Since commit 57a09bf0a416 is only about tracking registers with type PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers until they are promoted through pattern matching with a NULL check to either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the id becomes irrelevant for the transitioned types. For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(), but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even transferred further into other types that don't make use of it. Among others, one example is where UNKNOWN_VALUE is set on function call return with RET_INTEGER return type. states_equal() will then fall through the memcmp() on register state; note that the second memcmp() uses offsetofend(), so the id is part of that since d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect pointed already to 57a09bf0a416, where we really reach beyond complexity limit. What I found was that states_equal() often failed in this case due to id mismatches in spilled regs with registers in type PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform a memcmp() on their reg state and don't have any other optimizations in place, therefore also id was relevant in this case for making a pruning decision. We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE. For the affected program, it resulted in a ~17 fold reduction of complexity and let the program load fine again. Selftest suite also runs fine. The only other place where env->id_gen is used currently is through direct packet access, but for these cases id is long living, thus a different scenario. Also, the current logic in mark_map_regs() is not fully correct when marking NULL branch with UNKNOWN_VALUE. We need to cache the destination reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE, it's id is reset and any subsequent registers that hold the original id and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE anymore, since mark_map_reg() reuses the uncached regs[regno].id that was just overridden. Note, we don't need to cache it outside of mark_map_regs(), since it's called once on this_branch and the other time on other_branch, which are both two independent verifier states. A test case for this is added here, too. Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08bpf: fix state equivalenceAlexei Starovoitov
Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") and 484611357c19 ("bpf: allow access into map value arrays") by themselves are correct, but in combination they make state equivalence ignore 'id' field of the register state which can lead to accepting invalid program. Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bpf: fix loading of BPF_MAXINSNS sized programsDaniel Borkmann
General assumption is that single program can hold up to BPF_MAXINSNS, that is, 4096 number of instructions. It is the case with cBPF and that limit was carried over to eBPF. When recently testing digest, I noticed that it's actually not possible to feed 4096 instructions via bpf(2). The check for > BPF_MAXINSNS was added back then to bpf_check() in cbd357008604 ("bpf: verifier (add ability to receive verification log)"). However, 09756af46893 ("bpf: expand BPF syscall with program load/unload") added yet another check that comes before that into bpf_prog_load(), but this time bails out already in case of >= BPF_MAXINSNS. Fix it up and perform the check early in bpf_prog_load(), so we can drop the second one in bpf_check(). It makes sense, because also a 0 insn program is useless and we don't want to waste any resources doing work up to bpf_check() point. The existing bpf(2) man page documents E2BIG as the official error for such cases, so just stick with it as well. Fixes: 09756af46893 ("bpf: expand BPF syscall with program load/unload") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bpf: add prog_digest and expose it via fdinfo/netlinkDaniel Borkmann
When loading a BPF program via bpf(2), calculate the digest over the program's instruction stream and store it in struct bpf_prog's digest member. This is done at a point in time before any instructions are rewritten by the verifier. Any unstable map file descriptor number part of the imm field will be zeroed for the hash. fdinfo example output for progs: # cat /proc/1590/fdinfo/5 pos: 0 flags: 02000002 mnt_id: 11 prog_type: 1 prog_jited: 1 prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28 memlock: 4096 When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. Furthermore, this can also be exposed through other means such as netlink in case of a tc cls/act dump (or xdp in future), but also through tracepoints or other facilities to identify the program. Other than that, the digest can also serve as a base name for the work in progress kallsyms support of programs. The digest doesn't depend/select the crypto layer, since we need to keep dependencies to a minimum. iproute2 will get support for this facility. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bpf: Preserve const register type on const OR alu opsGianluca Borello
Occasionally, clang (e.g. version 3.8.1) translates a sum between two constant operands using a BPF_OR instead of a BPF_ADD. The verifier is currently not handling this scenario, and the destination register type becomes UNKNOWN_VALUE even if it's still storing a constant. As a result, the destination register cannot be used as argument to a helper function expecting a ARG_CONST_STACK_*, limiting some use cases. Modify the verifier to handle this case, and add a few tests to make sure all combinations are supported, and stack boundaries are still verified even with BPF_OR. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf: BPF for lightweight tunnel infrastructureThomas Graf
Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30bpf: fix states equal logic for varlen accessJosef Bacik
If we have a branch that looks something like this int foo = map->value; if (condition) { foo += blah; } else { foo = bar; } map->array[foo] = baz; We will incorrectly assume that the !condition branch is equal to the condition branch as the register for foo will be UNKNOWN_VALUE in both cases. We need to adjust this logic to only do this if we didn't do a varlen access after we processed the !condition branch, otherwise we have different ranges and need to check the other branch as well. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All conflicts were simple overlapping changes except perhaps for the Thunder driver. That driver has a change_mtu method explicitly for sending a message to the hardware. If that fails it returns an error. Normally a driver doesn't need an ndo_change_mtu method becuase those are usually just range changes, which are now handled generically. But since this extra operation is needed in the Thunder driver, it has to stay. However, if the message send fails we have to restore the original MTU before the change because the entire call chain expects that if an error is thrown by ndo_change_mtu then the MTU did not change. Therefore code is added to nicvf_change_mtu to remember the original MTU, and to restore it upon nicvf_update_hw_max_frs() failue. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16bpf: fix range arithmetic for bpf map accessJosef Bacik
I made some invalid assumptions with BPF_AND and BPF_MOD that could result in invalid accesses to bpf map entries. Fix this up by doing a few things 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real life and just adds extra complexity. 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the minimum value to 0 for positive AND's. 3) Don't do operations on the ranges if they are set to the limits, as they are by definition undefined, and allowing arithmetic operations on those values could make them appear valid when they really aren't. This fixes the testcase provided by Jann as well as a few other theoretical problems. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09bpf: Remove unused but set variablesTobias Klauser
Remove the unused but set variables min_set and max_set in adjust_reg_min_max_vals to fix the following warning when building with 'W=1': kernel/bpf/verifier.c:1483:7: warning: variable ‘min_set’ set but not used [-Wunused-but-set-variable] There is no warning about max_set being unused, but since it is only used in the assignment of min_set it can be removed as well. They were introduced in commit 484611357c19 ("bpf: allow access into map value arrays") but seem to have never been used. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29bpf: Print function name in addition to function idThomas Graf
The verifier currently prints raw function ids when printing CALL instructions or when complaining: 5: (85) call 23 unknown func 23 print a meaningful function name instead: 5: (85) call bpf_redirect#23 unknown func bpf_redirect#23 Moves the function documentation to a single comment and renames all helpers names in the list to conform to the bpf_ prefix notation so they can be greped in the kernel source. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registersThomas Graf
A BPF program is required to check the return register of a map_elem_lookup() call before accessing memory. The verifier keeps track of this by converting the type of the result register from PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional jump ensures safety. This check is currently exclusively performed for the result register 0. In the event the compiler reorders instructions, BPF_MOV64_REG instructions may be moved before the conditional jump which causes them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the verifier objects when the register is accessed: 0: (b7) r1 = 10 1: (7b) *(u64 *)(r10 -8) = r1 2: (bf) r2 = r10 3: (07) r2 += -8 4: (18) r1 = 0x59c00000 6: (85) call 1 7: (bf) r4 = r0 8: (15) if r0 == 0x0 goto pc+1 R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp 9: (7a) *(u64 *)(r4 +0) = 0 R4 invalid mem access 'map_value_or_null' This commit extends the verifier to keep track of all identical PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by assigning them an ID and then marking them all when the conditional jump is observed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-29bpf: allow access into map value arraysJosef Bacik
Suppose you have a map array value that is something like this struct foo { unsigned iter; int array[SOME_CONSTANT]; }; You can easily insert this into an array, but you cannot modify the contents of foo->array[] after the fact. This is because we have no way to verify we won't go off the end of the array at verification time. This patch provides a start for this work. We accomplish this by keeping track of a minimum and maximum value a register could be while we're checking the code. Then at the time we try to do an access into a MAP_VALUE we verify that the maximum offset into that region is a valid access into that memory region. So in practice, code such as this unsigned index = 0; if (foo->iter >= SOME_CONSTANT) foo->iter = index; else index = foo->iter++; foo->array[index] = bar; would be allowed, as we can verify that index will always be between 0 and SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra check to make sure the index isn't less than 0, or do something like index %= SOME_CONSTANT. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27bpf: Set register type according to is_valid_access()Mickaël Salaün
This prevent future potential pointer leaks when an unprivileged eBPF program will read a pointer value from its context. Even if is_valid_access() returns a pointer type, the eBPF verifier replace it with UNKNOWN_VALUE. The register value that contains a kernel address is then allowed to leak. Moreover, this fix allows unprivileged eBPF programs to use functions with (legitimate) pointer arguments. Not an issue currently since reg_type is only set for PTR_TO_PACKET or PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as privileged. For now, the only unprivileged eBPF program allowed is for socket filtering and all the types from its context are UNKNOWN_VALUE. However, this fix is important for future unprivileged eBPF programs which could use pointers in their context. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21bpf: recognize 64bit immediate loads as constsJakub Kicinski
When running as parser interpret BPF_LD | BPF_IMM | BPF_DW instructions as loading CONST_IMM with the value stored in imm. The verifier will continue not recognizing those due to concerns about search space/program complexity increase. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21bpf: enable non-core use of the verfierJakub Kicinski
Advanced JIT compilers and translators may want to use eBPF verifier as a base for parsers or to perform custom checks and validations. Add ability for external users to invoke the verifier and provide callbacks to be invoked for every intruction checked. For now only add most basic callback for per-instruction pre-interpretation checks is added. More advanced users may also like to have per-instruction post callback and state comparison callback. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21bpf: expose internal verfier structuresJakub Kicinski
Move verifier's internal structures to a header file and prefix their names with bpf_ to avoid potential namespace conflicts. Those structures will soon be used by external analyzers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21bpf: don't (ab)use instructions to store stateJakub Kicinski
Storing state in reserved fields of instructions makes it impossible to run verifier on programs already marked as read-only. Allocate and use an array of per-instruction state instead. While touching the error path rename and move existing jump target. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20bpf: direct packet write and access for helpers for clsact progsDaniel Borkmann
This work implements direct packet access for helpers and direct packet write in a similar fashion as already available for XDP types via commits 4acf6c0b84c9 ("bpf: enable direct packet data write for xdp progs") and 6841de8b0d03 ("bpf: allow helpers access the packet directly"), and as a complementary feature to the already available direct packet read for tc (cls/act) programs. For enabling this, we need to introduce two helpers, bpf_skb_pull_data() and bpf_csum_update(). The first is generally needed for both, read and write, because they would otherwise only be limited to the current linear skb head. Usually, when the data_end test fails, programs just bail out, or, in the direct read case, use bpf_skb_load_bytes() as an alternative to overcome this limitation. If such data sits in non-linear parts, we can just pull them in once with the new helper, retest and eventually access them. At the same time, this also makes sure the skb is uncloned, which is, of course, a necessary condition for direct write. As this needs to be an invariant for the write part only, the verifier detects writes and adds a prologue that is calling bpf_skb_pull_data() to effectively unclone the skb from the very beginning in case it is indeed cloned. The heuristic makes use of a similar trick that was done in 233577a22089 ("net: filter: constify detection of pkt_type_offset"). This comes at zero cost for other programs that do not use the direct write feature. Should a program use this feature only sparsely and has read access for the most parts with, for example, drop return codes, then such write action can be delegated to a tail called program for mitigating this cost of potential uncloning to a late point in time where it would have been paid similarly with the bpf_skb_store_bytes() as well. Advantage of direct write is that the writes are inlined whereas the helper cannot make any length assumptions and thus needs to generate a call to memcpy() also for small sizes, as well as cost of helper call itself with sanity checks are avoided. Plus, when direct read is already used, we don't need to cache or perform rechecks on the data boundaries (due to verifier invalidating previous checks for helpers that change skb->data), so more complex programs using rewrites can benefit from switching to direct read plus write. For direct packet access to helpers, we save the otherwise needed copy into a temp struct sitting on stack memory when use-case allows. Both facilities are enabled via may_access_direct_pkt_data() in verifier. For now, we limit this to map helpers and csum_diff, and can successively enable other helpers where we find it makes sense. Helpers that definitely cannot be allowed for this are those part of bpf_helper_changes_skb_data() since they can change underlying data, and those that write into memory as this could happen for packet typed args when still cloned. bpf_csum_update() helper accommodates for the fact that we need to fixup checksum_complete when using direct write instead of bpf_skb_store_bytes(), meaning the programs can use available helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(), csum_block_add(), csum_block_sub() equivalents in eBPF together with the new helper. A usage example will be provided for iproute2's examples/bpf/ directory. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20bpf, verifier: enforce larger zero range for pkt on overloading stack buffsDaniel Borkmann
Current contract for the following two helper argument types is: * ARG_CONST_STACK_SIZE: passed argument pair must be (ptr, >0). * ARG_CONST_STACK_SIZE_OR_ZERO: passed argument pair can be either (NULL, 0) or (ptr, >0). With 6841de8b0d03 ("bpf: allow helpers access the packet directly"), we can pass also raw packet data to helpers, so depending on the argument type being PTR_TO_PACKET, we now either assert memory via check_packet_access() or check_stack_boundary(). As a result, the tests in check_packet_access() currently allow more than intended with regards to reg->imm. Back in 969bf05eb3ce ("bpf: direct packet access"), check_packet_access() was fine to ignore size argument since in check_mem_access() size was bpf_size_to_bytes() derived and prior to the call to check_packet_access() guaranteed to be larger than zero. However, for the above two argument types, it currently means, we can have a <= 0 size and thus breaking current guarantees for helpers. Enforce a check for size <= 0 and bail out if so. check_stack_boundary() doesn't have such an issue since it already tests for access_size <= 0 and bails out, resp. access_size == 0 in case of NULL pointer passed when allowed. Fixes: 6841de8b0d03 ("bpf: allow helpers access the packet directly") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08bpf: fix range propagation on direct packet accessDaniel Borkmann
LLVM can generate code that tests for direct packet access via skb->data/data_end in a way that currently gets rejected by the verifier, example: [...] 7: (61) r3 = *(u32 *)(r6 +80) 8: (61) r9 = *(u32 *)(r6 +76) 9: (bf) r2 = r9 10: (07) r2 += 54 11: (3d) if r3 >= r2 goto pc+12 R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 12: (18) r4 = 0xffffff7a 14: (05) goto pc+430 [...] from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 24: (7b) *(u64 *)(r10 -40) = r1 25: (b7) r1 = 0 26: (63) *(u32 *)(r6 +56) = r1 27: (b7) r2 = 40 28: (71) r8 = *(u8 *)(r9 +20) invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0) The reason why this gets rejected despite a proper test is that we currently call find_good_pkt_pointers() only in case where we detect tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and derived, for example, from a register of type pkt(id=Y,off=0,r=0) pointing to skb->data. find_good_pkt_pointers() then fills the range in the current branch to pkt(id=Y,off=0,r=Z) on success. For above case, we need to extend that to recognize pkt_end >= rX pattern and mark the other branch that is taken on success with the appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers(). Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the only two practical options to test for from what LLVM could have generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=) that we would need to take into account as well. After the fix: [...] 7: (61) r3 = *(u32 *)(r6 +80) 8: (61) r9 = *(u32 *)(r6 +76) 9: (bf) r2 = r9 10: (07) r2 += 54 11: (3d) if r3 >= r2 goto pc+12 R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 12: (18) r4 = 0xffffff7a 14: (05) goto pc+430 [...] from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp 24: (7b) *(u64 *)(r10 -40) = r1 25: (b7) r1 = 0 26: (63) *(u32 *)(r6 +56) = r1 27: (b7) r2 = 40 28: (71) r8 = *(u8 *)(r9 +20) 29: (bf) r1 = r8 30: (25) if r8 > 0x3c goto pc+47 R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56 R9=pkt(id=0,off=0,r=54) R10=fp 31: (b7) r1 = 1 [...] Verifier test cases are also added in this work, one that demonstrates the mentioned example here and one that tries a bad packet access for the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0), pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two with both test variants (>, >=). Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02bpf: perf_event progs should only use preallocated mapsAlexei Starovoitov
Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use preallocated hash maps, since doing memory allocation in overflow_handler can crash depending on where nmi got triggered. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>