summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-05-25Merge branch 'fix-some-bugs-in-stmmac'David S. Miller
Biao Huang says: ==================== fix some bugs in stmmac changes in v4: since MTL_OPERATION_MODE write back issue has be fixed in the latest driver, remove original patch#3 changes in v3: add a Fixes:tag for each patch changes in v2: 1. update rx_tail_addr as Jose's comment 2. changes clk_csr condition as Alex's proposition 3. remove init lines in dwmac-mediatek, get clk_csr from dts instead. v1: This series fix some bugs in stmmac driver 3 patches are for common stmmac or dwmac4: 1. update rx tail pointer to fix rx dma hang issue. 2. change condition for mdc clock to fix csr_clk can't be zero issue. 3. write the modified value back to MTL_OPERATION_MODE. 1 patch is for dwmac-mediatek: modify csr_clk value to fix mdio read/write fail issue for dwmac-mediatek ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25net: stmmac: dwmac-mediatek: modify csr_clk value to fix mdio read/write failBiao Huang
1. the frequency of csr clock is 66.5MHz, so the csr_clk value should be 0 other than 5. 2. the csr_clk can be got from device tree, so remove initialization here. Fixes: 9992f37e346b ("stmmac: dwmac-mediatek: add support for mt2712") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25net: stmmac: fix csr_clk can't be zero issueBiao Huang
The specific clk_csr value can be zero, and stmmac_clk is necessary for MDC clock which can be set dynamically. So, change the condition from plat->clk_csr to plat->stmmac_clk to fix clk_csr can't be zero issue. Fixes: cd7201f477b9 ("stmmac: MDC clock dynamically based on the csr clock input") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25net: stmmac: update rx tail pointer register to fix rx dma hang issue.Biao Huang
Currently we will not update the receive descriptor tail pointer in stmmac_rx_refill. Rx dma will think no available descriptors and stop once received packets exceed DMA_RX_SIZE, so that the rx only test will fail. Update the receive tail pointer in stmmac_rx_refill to add more descriptors to the rx channel, so packets can be received continually Fixes: 54139cf3bb33 ("net: stmmac: adding multiple buffers for rx") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25ip_sockglue: Fix missing-check bug in ip_ra_control()Gen Zhang
In function ip_ra_control(), the pointer new_ra is allocated a memory space via kmalloc(). And it is used in the following codes. However, when there is a memory allocation error, kmalloc() fails. Thus null pointer dereference may happen. And it will cause the kernel to crash. Therefore, we should check the return value and handle the error. Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25ipv6_sockglue: Fix a missing-check bug in ip6_ra_control()Gen Zhang
In function ip6_ra_control(), the pointer new_ra is allocated a memory space via kmalloc(). And it is used in the following codes. However, when there is a memory allocation error, kmalloc() fails. Thus null pointer dereference may happen. And it will cause the kernel to crash. Therefore, we should check the return value and handle the error. Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25flow_offload: use struct_size() in kzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25Merge tag 'libnvdimm-fixes-5.2-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - Fix a regression that disabled device-mapper dax support - Remove unnecessary hardened-user-copy overhead (>30%) for dax read(2)/write(2). - Fix some compilation warnings. * tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead dax: Arrange for dax_supported check to span multiple devices libnvdimm: Fix compilation warnings with W=1
2019-05-25Merge tag 'trace-v5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Tom Zanussi sent me some small fixes and cleanups to the histogram code and I forgot to incorporate them. I also added a small clean up patch that was sent to me a while ago and I just noticed it" * tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: kernel/trace/trace.h: Remove duplicate header of trace_seq.h tracing: Add a check_val() check before updating cond_snapshot() track_val tracing: Check keys for variable references in expressions too tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
2019-05-24ext4: fix dcache lookup of !casefolded directoriesGabriel Krisman Bertazi
Found by visual inspection, this wasn't caught by my xfstest, since it's effect is ignoring positive dentries in the cache the fallback just goes to the disk. it was introduced in the last iteration of the case-insensitive patch. d_compare should return 0 when the entries match, so make sure we are correctly comparing the entire string if the encoding feature is set and we are on a case-INsensitive directory. Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-05-24samples: bpf: add ibumad sample to .gitignoreMatteo Croce
This commit adds ibumad to .gitignore which is currently ommited from the ignore file. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24Merge branch 'optimize-zext'Alexei Starovoitov
Jiong Wang says: ==================== v9: - Split patch 5 in v8. make bpf uapi header file sync a separate patch. (Alexei) v8: - For stack slot read, mark them as REG_LIVE_READ64. (Alexei) - Change DEF_NOT_SUBREG from -1 to 0. (Alexei) - Rebased on top of latest bpf-next. v7: - Drop the first patch in v6, the one adding 32-bit return value and argument type. (Alexei) - Rename bpf_jit_hardware_zext to bpf_jit_needs_zext. (Alexei) - Use mov32 with imm == 1 to indicate it is zext. (Alexei) - JIT back-ends peephole next insn to optimize out unnecessary zext inserted by verifier. (Alexei) - Testing: + patch set tested (bpf selftest) on x64 host with llvm 9.0 no regression observed no both JIT and interpreter modes. + patch set tested (bpf selftest) on x32 host. By Yanqing Wang, thanks! no regression observed on both JIT and interpreter modes. + patch set tested (bpf selftest) on RV64 host with llvm 9.0, By Björn Töpel, thanks! no regression observed before and after this set with JIT_ALWAYS_ON. test_progs_32 also enabled as LLVM 9.0 is used by Björn. + cross compiled the other affected targets, arm, PowerPC, SPARC, S390. v6: - Fixed s390 kbuild test robot error. (kbuild) - Make comment style in backends patches more consistent. v5: - Adjusted several test_verifier helpers to make them works on hosts w and w/o hardware zext. (Naveen) - Make sure zext flag not set when verifier by-passed, for example, libtest_bpf.ko. (Naveen) - Conservatively mark bpf main return value as 64-bit. (Alexei) - Make sure read flag is either READ64 or READ32, not the mix of both. (Alexei) - Merged patch 1 and 2 in v4. (Alexei) - Fixed kbuild test robot warning on NFP. (kbuild) - Proposed new BPF_ZEXT insn to have optimal code-gen for various JIT back-ends. - Conservately set zext flags for patched-insn. - Fixed return value zext for helper function calls. - Also adjusted test_verifier scalability unit test to avoid triggerring too many insn patch which will hang computer. - re-tested on x86 host with llvm 9.0, no regression on test_verifier, test_progs, test_progs_32. - re-tested offload target (nfp), no regression on local testsuite. v4: - added the two missing fixes which addresses two Jakub's reviewes in v3. - rebase on top of bpf-next. v3: - remove redundant check in "propagate_liveness_reg". (Jakub) - add extra check in "mark_reg_read" to prune more search. (Jakub) - re-implemented "prog_flags" passing mechanism, removed use of global switch inside libbpf. - enabled high 32-bit randomization beyond "test_verifier" and "test_progs". Now it should have been enabled for all possible tests. Re-run all tests, haven't noticed regression. - remove RFC tag. v2: - rebased on top of bpf-next master. - added comments for what is sub-register def index. (Edward, Alexei) - removed patch 1 which turns bit mask from enum to macro. (Alexei) - removed sysctl/bpf_jit_32bit_opt. (Alexei) - merged sub-register def insn index into reg state. (Alexei) - change test methodology (Alexei): + instead of simple unit tests on x86_64 for which this optimization doesn't enabled due to there is hardware support, poison high 32-bit for whose def identified as safe to do so. this could let the correctness of this patch set checked when daily bpf selftest ran which delivers very stressful test on host machine like x86_64. + hi32 poisoning is gated by a new BPF_F_TEST_RND_HI32 prog flags. + BPF_F_TEST_RND_HI32 is enabled for all tests of "test_progs" and "test_verifier", the latter needs minor tweak on two unit tests, please see the patch for the change. + introduced a new global variable "libbpf_test_mode" into libbpf. once it is set to true, it will set BPF_F_TEST_RND_HI32 for all the later PROG_LOAD syscall, the goal is to easy the enable of hi32 poison on exsiting testsuite. we could also introduce new APIs, for example "bpf_prog_test_load", then use -Dbpf_prog_load=bpf_prog_test_load to migrate tests under test_progs, but there are several load APIs, and such new API need some change on struture like "struct bpf_prog_load_attr". + removed old unit tests. it is based on insn scan and requires quite a few test_verifier generic code change. given hi32 randomization could offer good test coverage, the unit tests doesn't add much extra test value. - enhanced register width check ("is_reg64") when record sub-register write, now, it returns more accurate width. - Re-run all tests under "test_progs" and "test_verifier" on x86_64, no regression. Fixed a couple of bugs exposed: 1. ctx field size transformation was not taken into account. 2. insn patch could cause lost of original aux data which is important for ctx field conversion. 3. return value for propagate_liveness was wrong and caused regression on processed insn number. 4. helper call arg wasn't handled properly that path prune may cause 64-bit read info in pruned path lost. - Re-run Cilium bpf prog for processed-insn-number benchmarking, no regression. v1: - Fixed the missing handling on callee-saved for bpf-to-bpf call, sub-register defs therefore moved to frame state. (Jakub Kicinski) - Removed redundant "cross_reg". (Jakub Kicinski) - Various coding styles & grammar fixes. (Jakub Kicinski, Quentin Monnet) eBPF ISA specification requires high 32-bit cleared when low 32-bit sub-register is written. This applies to destination register of ALU32 etc. JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and AArch64 ISA has the same semantics, so the corresponding JIT back-end doesn't need to do extra work. However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches (PowerPC, SPARC etc) need to do explicit zero extension to meet this requirement, otherwise code like the following will fail. u64_value = (u64) u32_value ... other uses of u64_value This is because compiler could exploit the semantic described above and save those zero extensions for extending u32_value to u64_value, these JIT back-ends are expected to guarantee this through inserting extra zero extensions which however could be a significant increase on the code size. Some benchmarks show there could be ~40% sub-register writes out of total insns, meaning at least ~40% extra code-gen. One observation is these extra zero extensions are not always necessary. Take above code snippet for example, it is possible u32_value will never be casted into a u64, the value of high 32-bit of u32_value then could be ignored and extra zero extension could be eliminated. This patch implements this idea, insns defining sub-registers will be marked when the high 32-bit of the defined sub-register matters. For those unmarked insns, it is safe to eliminate high 32-bit clearnace for them. Algo ==== We could use insn scan based static analysis to tell whether one sub-register def doesn't need zero extension. However, using such static analysis, we must do conservative assumption at branching point where multiple uses could be introduced. So, for any sub-register def that is active at branching point, we need to mark it as needing zero extension. This could introducing quite a few false alarms, for example ~25% on Cilium bpf_lxc. It will be far better to use dynamic data-flow tracing which verifier fortunately already has and could be easily extend to serve the purpose of this patch set. - Split read flags into READ32 and READ64. - Record index of insn that does sub-register write. Keep the index inside reg state and update it during verifier insn walking. - A full register read on a sub-register marks its definition insn as needing zero extension on dst register. A new sub-register write overrides the old one. - When propagating read64 during path pruning, also mark any insn defining a sub-register that is read in the pruned path as full-register. Benchmark ========= - I estimate the JITed image could be 10% ~ 30% smaller on these affected arches (nfp, arm, x32, risv, ppc, sparc, s390), depending on the prog. - For Cilium bpf_lxc, there is ~11500 insns in the compiled binary (use latest LLVM snapshot, and with -mcpu=v3 -mattr=+alu32 enabled), 4460 of them has sub-register writes (~40%). Calculated by: cat dump | grep -P "\tw" | wc -l (ALU32) cat dump | grep -P "r.*=.*u32" | wc -l (READ_W) cat dump | grep -P "r.*=.*u16" | wc -l (READ_H) cat dump | grep -P "r.*=.*u8" | wc -l (READ_B) After this patch set enabled, > 25% of those 4460 could be identified as doesn't needing zero extension on the destination, and the percentage could go further up to more than 50% with some follow up optimizations based on the infrastructure offered by this set. This leads to significant save on JITed image. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24nfp: bpf: eliminate zero extension code-genJiong Wang
This patch eliminate zero extension code-gen for instructions including both alu and load/store. The only exception is for ctx load, because offload target doesn't go through host ctx convert logic so we do customized load and ignores zext flag set by verifier. Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24riscv: bpf: eliminate zero extension code-genJiong Wang
Cc: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Tested-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24x32: bpf: eliminate zero extension code-genJiong Wang
Cc: Wang YanQing <udknight@gmail.com> Tested-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24sparc: bpf: eliminate zero extension code-genJiong Wang
Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24s390: bpf: eliminate zero extension code-genJiong Wang
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24powerpc: bpf: eliminate zero extension code-genJiong Wang
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24arm: bpf: eliminate zero extension code-genJiong Wang
Cc: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24selftests: bpf: enable hi32 randomization for all testsJiong Wang
The previous libbpf patch allows user to specify "prog_flags" to bpf program load APIs. To enable high 32-bit randomization for a test, we need to set BPF_F_TEST_RND_HI32 in "prog_flags". To enable such randomization for all tests, we need to make sure all places are passing BPF_F_TEST_RND_HI32. Changing them one by one is not convenient, also, it would be better if a test could be switched to "normal" running mode without code change. Given the program load APIs used across bpf selftests are mostly: bpf_prog_load: load from file bpf_load_program: load from raw insns A test_stub.c is implemented for bpf seltests, it offers two functions for testing purpose: bpf_prog_test_load bpf_test_load_program The are the same as "bpf_prog_load" and "bpf_load_program", except they also set BPF_F_TEST_RND_HI32. Given *_xattr functions are the APIs to customize any "prog_flags", it makes little sense to put these two functions into libbpf. Then, the following CFLAGS are passed to compilations for host programs: -Dbpf_prog_load=bpf_prog_test_load -Dbpf_load_program=bpf_test_load_program They migrate the used load APIs to the test version, hence enable high 32-bit randomization for these tests without changing source code. Besides all these, there are several testcases are using "bpf_prog_load_attr" directly, their call sites are updated to pass BPF_F_TEST_RND_HI32. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24selftests: bpf: adjust several test_verifier helpers for insn insertionJiong Wang
- bpf_fill_ld_abs_vlan_push_pop: Prevent zext happens inside PUSH_CNT loop. This could happen because of BPF_LD_ABS (32-bit def) + BPF_JMP (64-bit use), or BPF_LD_ABS + EXIT (64-bit use of R0). So, change BPF_JMP to BPF_JMP32 and redefine R0 at exit path to cut off the data-flow from inside the loop. - bpf_fill_jump_around_ld_abs: Jump range is limited to 16 bit. every ld_abs is replaced by 6 insns, but on arches like arm, ppc etc, there will be one BPF_ZEXT inserted to extend the error value of the inlined ld_abs sequence which then contains 7 insns. so, set the dividend to 7 so the testcase could work on all arches. - bpf_fill_scale1/bpf_fill_scale2: Both contains ~1M BPF_ALU32_IMM which will trigger ~1M insn patcher call because of hi32 randomization later when BPF_F_TEST_RND_HI32 is set for bpf selftests. Insn patcher is not efficient that 1M call to it will hang computer. So , change to BPF_ALU64_IMM to avoid hi32 randomization. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24libbpf: add "prog_flags" to bpf_program/bpf_prog_load_attr/bpf_load_program_attrJiong Wang
libbpf doesn't allow passing "prog_flags" during bpf program load in a couple of load related APIs, "bpf_load_program_xattr", "load_program" and "bpf_prog_load_xattr". It makes sense to allow passing "prog_flags" which is useful for customizing program loading. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is setJiong Wang
This patch randomizes high 32-bit of a definition when BPF_F_TEST_RND_HI32 is set. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24tools: bpf: sync uapi header bpf.hJiong Wang
Sync new bpf prog load flag "BPF_F_TEST_RND_HI32" to tools/. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32"Jiong Wang
x86_64 and AArch64 perhaps are two arches that running bpf testsuite frequently, however the zero extension insertion pass is not enabled for them because of their hardware support. It is critical to guarantee the pass correction as it is supposed to be enabled at default for a couple of other arches, for example PowerPC, SPARC, arm, NFP etc. Therefore, it would be very useful if there is a way to test this pass on for example x86_64. The test methodology employed by this set is "poisoning" useless bits. High 32-bit of a definition is randomized if it is identified as not used by any later insn. Such randomization is only enabled under testing mode which is gated by the new bpf prog load flags "BPF_F_TEST_RND_HI32". Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpf: verifier: insert zero extension according to analysis resultJiong Wang
After previous patches, verifier will mark a insn if it really needs zero extension on dst_reg. It is then for back-ends to decide how to use such information to eliminate unnecessary zero extension code-gen during JIT compilation. One approach is verifier insert explicit zero extension for those insns that need zero extension in a generic way, JIT back-ends then do not generate zero extension for sub-register write at default. However, only those back-ends which do not have hardware zero extension want this optimization. Back-ends like x86_64 and AArch64 have hardware zero extension support that the insertion should be disabled. This patch introduces new target hook "bpf_jit_needs_zext" which returns false at default, meaning verifier zero extension insertion is disabled at default. A back-end could override this hook to return true if it doesn't have hardware support and want verifier insert zero extension explicitly. Offload targets do not use this native target hook, instead, they could get the optimization results using bpf_prog_offload_ops.finalize. NOTE: arches could have diversified features, it is possible for one arch to have hardware zero extension support for some sub-register write insns but not for all. For example, PowerPC, SPARC have zero extended loads, but not for alu32. So when verifier zero extension insertion enabled, these JIT back-ends need to peephole insns to remove those zero extension inserted for insn that actually has hardware zero extension support. The peephole could be as simple as looking the next insn, if it is a special zero extension insn then it is safe to eliminate it if the current insn has hardware zero extension support. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpf: introduce new mov32 variant for doing explicit zero extensionJiong Wang
The encoding for this new variant is based on BPF_X format. "imm" field was 0 only, now it could be 1 which means doing zero extension unconditionally .code = BPF_ALU | BPF_MOV | BPF_X .dst_reg = DST .src_reg = SRC .imm = 1 We use this new form for doing zero extension for which verifier will guarantee SRC == DST. Implications on JIT back-ends when doing code-gen for BPF_ALU | BPF_MOV | BPF_X: 1. No change if hardware already does zero extension unconditionally for sub-register write. 2. Otherwise, when seeing imm == 1, just generate insns to clear high 32-bit. No need to generate insns for the move because when imm == 1, dst_reg is the same as src_reg at the moment. Interpreter doesn't need change as well. It is doing unconditionally zero extension for mov32 already. One helper macro BPF_ZEXT_REG is added to help creating zero extension insn using this new mov32 variant. One helper function insn_is_zext is added for checking one insn is an zero extension on dst. This will be widely used by a few JIT back-ends in later patches in this set. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpf: verifier: mark patched-insn with sub-register zext flagJiong Wang
Patched insns do not go through generic verification, therefore doesn't has zero extension information collected during insn walking. We don't bother analyze them at the moment, for any sub-register def comes from them, just conservatively mark it as needing zero extension. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpf: verifier: mark verified-insn with sub-register zext flagJiong Wang
eBPF ISA specification requires high 32-bit cleared when low 32-bit sub-register is written. This applies to destination register of ALU32 etc. JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and AArch64 ISA has the same semantics, so the corresponding JIT back-end doesn't need to do extra work. However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches (PowerPC, SPARC etc) need to do explicit zero extension to meet this requirement, otherwise code like the following will fail. u64_value = (u64) u32_value ... other uses of u64_value This is because compiler could exploit the semantic described above and save those zero extensions for extending u32_value to u64_value, these JIT back-ends are expected to guarantee this through inserting extra zero extensions which however could be a significant increase on the code size. Some benchmarks show there could be ~40% sub-register writes out of total insns, meaning at least ~40% extra code-gen. One observation is these extra zero extensions are not always necessary. Take above code snippet for example, it is possible u32_value will never be casted into a u64, the value of high 32-bit of u32_value then could be ignored and extra zero extension could be eliminated. This patch implements this idea, insns defining sub-registers will be marked when the high 32-bit of the defined sub-register matters. For those unmarked insns, it is safe to eliminate high 32-bit clearnace for them. Algo: - Split read flags into READ32 and READ64. - Record index of insn that does sub-register write. Keep the index inside reg state and update it during verifier insn walking. - A full register read on a sub-register marks its definition insn as needing zero extension on dst register. A new sub-register write overrides the old one. - When propagating read64 during path pruning, also mark any insn defining a sub-register that is read in the pruned path as full-register. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is the same set of patches sent in the merge window as the final pull except that Martin's read only rework is replaced with a simple revert of the original change that caused the regression. Everything else is an obvious fix or small cleanup" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: Revert "scsi: sd: Keep disk read-only when re-reading partition" scsi: bnx2fc: fix incorrect cast to u64 on shift operation scsi: smartpqi: Reporting unhandled SCSI errors scsi: myrs: Fix uninitialized variable scsi: lpfc: Update lpfc version to 12.2.0.2 scsi: lpfc: add check for loss of ndlp when sending RRQ scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show scsi: lpfc: resolve lockdep warnings scsi: qedi: remove set but not used variables 'cdev' and 'udev' scsi: qedi: remove memset/memcpy to nfunc and use func instead scsi: qla2xxx: Add cleanup for PCI EEH recovery
2019-05-24Merge tag 'for-linus-20190524' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request from Keith, with fixes from a few folks. - bio and sbitmap before atomic barrier fixes (Andrea) - Hang fix for blk-mq freeze and unfreeze (Bob) - Single segment count regression fix (Christoph) - AoE now has a new maintainer - tools/io_uring/ Makefile fix, and sync with liburing (me) * tag 'for-linus-20190524' of git://git.kernel.dk/linux-block: (23 commits) tools/io_uring: sync with liburing tools/io_uring: fix Makefile for pthread library link blk-mq: fix hang caused by freeze/unfreeze sequence block: remove the bi_seg_{front,back}_size fields in struct bio block: remove the segment size check in bio_will_gap block: force an unlimited segment size on queues with a virt boundary block: don't decrement nr_phys_segments for physically contigous segments sbitmap: fix improper use of smp_mb__before_atomic() bio: fix improper use of smp_mb__before_atomic() aoe: list new maintainer for aoe driver nvme-pci: use blk-mq mapping for unmanaged irqs nvme: update MAINTAINERS nvme: copy MTFA field from identify controller nvme: fix memory leak for power latency tolerance nvme: release namespace SRCU protection before performing controller ioctls nvme: merge nvme_ns_ioctl into nvme_ioctl nvme: remove the ifdef around nvme_nvm_ioctl nvme: fix srcu locking on error return in nvme_get_ns_from_disk nvme: Fix known effects nvme-pci: Sync queues on reset ...
2019-05-24Merge tag 'linux-kselftest-5.2-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - Two fixes to regressions introduced in kselftest Makefile test run output refactoring work (Kees Cook) - Adding Atom support to syscall_arg_fault test (Tong Bo) * tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/timers: Add missing fflush(stdout) calls selftests: Remove forced unbuffering for test running selftests/x86: Support Atom for syscall_arg_fault test
2019-05-24Merge tag 'devicetree-fixes-for-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree fixes from Rob Herring: - Update checkpatch.pl to use DT vendor-prefixes.yaml - Fix DT binding references to files converted to DT schema - Clean-up Arm CPU binding examples to match schema - Add Sifive block versioning scheme documentation - Pass binding directory base to validation tools for reference lookups * tag 'devicetree-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: checkpatch.pl: Update DT vendor prefix check dt: bindings: mtd: replace references to nand.txt with nand-controller.yaml dt-bindings: interrupt-controller: arm,gic: Fix schema errors in example dt-bindings: arm: Clean up CPU binding examples dt: fix refs that were renamed to json with the same file name dt-bindings: Pass binding directory to validation tools dt-bindings: sifive: describe sifive-blocks versioning
2019-05-24Merge tag 'spdx-5.2-rc2-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pule more SPDX updates from Greg KH: "Here is another set of reviewed patches that adds SPDX tags to different kernel files, based on a set of rules that are being used to parse the comments to try to determine that the license of the file is "GPL-2.0-or-later". Only the "obvious" versions of these matches are included here, a number of "non-obvious" variants of text have been found but those have been postponed for later review and analysis. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers" * tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (85 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 125 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 123 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 122 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 121 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 119 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 116 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 113 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 111 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 110 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 106 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 105 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 101 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98 ...
2019-05-24Merge branch 'bpf-send-sig'Daniel Borkmann
Yonghong Song says: ==================== This patch tries to solve the following specific use case. Currently, bpf program can already collect stack traces through kernel function get_perf_callchain() when certain events happens (e.g., cache miss counter or cpu clock counter overflows). But such stack traces are not enough for jitted programs, e.g., hhvm (jited php). To get real stack trace, jit engine internal data structures need to be traversed in order to get the real user functions. bpf program itself may not be the best place to traverse the jit engine as the traversing logic could be complex and it is not a stable interface either. Instead, hhvm implements a signal handler, e.g. for SIGALARM, and a set of program locations which it can dump stack traces. When it receives a signal, it will dump the stack in next such program location. This patch implements bpf_send_signal() helper to send a signal to hhvm in real time, resulting in intended stack traces. Patch #1 implemented the bpf_send_helper() in the kernel. Patch #2 synced uapi header bpf.h to tools directory. Patch #3 added a self test which covers tracepoint and perf_event bpf programs. Changelogs: v4 => v5: . pass the "current" task struct to irq_work as well since the current task struct may change between nmi and subsequent irq_work_interrupt. Discovered by Daniel. v3 => v4: . fix one typo and declare "const char *id_path = ..." to avoid directly use the long string in the func body in Patch #3. v2 => v3: . change the standalone test to be part of prog_tests. RFC v1 => v2: . previous version allows to send signal to an arbitrary pid. This version just sends the signal to current task to avoid unstable pid and potential races between sending signals and task state changes for the pid. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-24tools/bpf: add selftest in test_progs for bpf_send_signal() helperYonghong Song
The test covered both nmi and tracepoint perf events. $ ./test_progs ... test_send_signal_tracepoint:PASS:tracepoint 0 nsec ... test_send_signal_common:PASS:tracepoint 0 nsec ... test_send_signal_common:PASS:perf_event 0 nsec ... test_send_signal:OK Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-24tools/bpf: sync bpf uapi header bpf.h to tools directoryYonghong Song
The bpf uapi header include/uapi/linux/bpf.h is sync'ed to tools/include/uapi/linux/bpf.h. Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-24bpf: implement bpf_send_signal() helperYonghong Song
This patch tries to solve the following specific use case. Currently, bpf program can already collect stack traces through kernel function get_perf_callchain() when certain events happens (e.g., cache miss counter or cpu clock counter overflows). But such stack traces are not enough for jitted programs, e.g., hhvm (jited php). To get real stack trace, jit engine internal data structures need to be traversed in order to get the real user functions. bpf program itself may not be the best place to traverse the jit engine as the traversing logic could be complex and it is not a stable interface either. Instead, hhvm implements a signal handler, e.g. for SIGALARM, and a set of program locations which it can dump stack traces. When it receives a signal, it will dump the stack in next such program location. Such a mechanism can be implemented in the following way: . a perf ring buffer is created between bpf program and tracing app. . once a particular event happens, bpf program writes to the ring buffer and the tracing app gets notified. . the tracing app sends a signal SIGALARM to the hhvm. But this method could have large delays and causing profiling results skewed. This patch implements bpf_send_signal() helper to send a signal to hhvm in real time, resulting in intended stack traces. Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-24locking/lock_events: Use this_cpu_add() when necessaryWaiman Long
The kernel test robot has reported that the use of __this_cpu_add() causes bug messages like: BUG: using __this_cpu_add() in preemptible [00000000] code: ... Given the imprecise nature of the count and the possibility of resetting the count and doing the measurement again, this is not really a big problem to use the unprotected __this_cpu_*() functions. To make the preemption checking code happy, the this_cpu_*() functions will be used if CONFIG_DEBUG_PREEMPT is defined. The imprecise nature of the locking counts are also documented with the suggestion that we should run the measurement a few times with the counts reset in between to get a better picture of what is going on under the hood. Fixes: a8654596f0371 ("locking/rwsem: Enable lock event counting") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-24Merge branch 'btf2c-converter'Alexei Starovoitov
Andrii Nakryiko says: ==================== This patch set adds BTF-to-C dumping APIs to libbpf, allowing to output a subset of BTF types as a compilable C type definitions. This is useful by itself, as raw BTF output is not easy to inspect and comprehend. But it's also a big part of BPF CO-RE (compile once - run everywhere) initiative aimed at allowing to write relocatable BPF programs, that won't require on-the-host kernel headers (and would be able to inspect internal kernel structures, not exposed through kernel headers). This patch set consists of three groups of patches and one pre-patch, with the BTF-to-C dumper API depending on the first two groups. Pre-patch #1 fixes issue with libbpf_internal.h. btf__parse_elf() API patches: - patch #2 adds btf__parse_elf() API to libbpf, allowing to load BTF and/or BTF.ext from ELF file; - patch #3 utilizies btf__parse_elf() from bpftool for `btf dump file` command; - patch #4 switches test_btf.c to use btf__parse_elf() to check for presence of BTF data in object file. libbpf's internal hashmap patches: - patch #5 adds resizeable non-thread safe generic hashmap to libbpf; - patch #6 adds tests for that hashmap; - patch #7 migrates btf_dedup()'s dedup_table to use hashmap w/ APPEND. BTF-to-C dumper API patches: - patch #8 adds btf_dump APIs with all the logic for laying out type definitions in correct order and emitting C syntax for them; - patch #9 adds lots of tests for common and quirky parts of C type system; - patch #10 adds support for C-syntax btf dumping to bpftool; - patch #11 updates bpftool documentation to mention C-syntax dump option; - patch #12 update bash-completion for btf dump sub-command. v2->v3: - fix bpftool-btf.rst formatting (Quentin); - simplify bash autocompletion script (Quentin); - better error message in btf dump (Quentin); v1->v2: - removed unuseful file header (Jakub); - removed inlines in .c (Jakub); - added 'format {c|raw}' keyword/option (Jakub); - re-use i var for iteration in btf_dump_c() (Jakub); - bumped libbpf version to 0.0.4; v0->v1: - fix bug in hashmap__for_each_bucket_entry() not handling empty hashmap; - removed `btf dump`-specific libbpf logging hook up (Quentin has more generic patchset); - change btf__parse_elf() to always load .BTF and return it as a result, with .BTF.ext being optional and returned through struct btf_ext** arg (Alexei); - endianness check to use __BYTE_ORDER__ (Alexei); - bool:1 to __u8:1 in type_aux_state (Alexei); - added HASHMAP_APPEND strategy to hashmap, changed hashmap__for_each_key_entry() to also check for key equality during iteration (multimap iteration for key); - added new tests for empty hashmap and hashmap as a multimap; - tried to clarify weak/strong dependency ordering comments (Alexei) - btf dump test's expected output - support better commenting aproach (Alexei); - added bash-completion for a new "c" option (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpftool: update bash-completion w/ new c option for btf dumpAndrii Nakryiko
Add bash completion for new C btf dump option. Cc: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpftool/docs: add description of btf dump C optionAndrii Nakryiko
Document optional **c** option for btf dump subcommand. Cc: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpftool: add C output format option to btf dump subcommandAndrii Nakryiko
Utilize new libbpf's btf_dump API to emit BTF as a C definitions. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24selftests/bpf: add btf_dump BTF-to-C conversion testsAndrii Nakryiko
Add new test_btf_dump set of tests, validating BTF-to-C conversion correctness. Tests rely on clang to generate BTF from provided C test cases. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24libbpf: add btf_dump API for BTF-to-C conversionAndrii Nakryiko
BTF contains enough type information to allow generating valid compilable C header w/ correct layout of structs/unions and all the typedef/enum definitions. This patch adds a new "object" - btf_dump to facilitate dumping BTF as valid C. btf_dump__dump_type() is the main API which takes care of dumping out (through user-provided printf-like callback function) C definitions for given type ID and it's required dependencies. This allows for not just dumping out entirety of BTF types, but also selective filtering based on user-provided criterias w/ minimal set of dependent types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24libbpf: switch btf_dedup() to hashmap for dedup tableAndrii Nakryiko
Utilize libbpf's hashmap as a multimap fof dedup_table implementation. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24selftests/bpf: add tests for libbpf's hashmapAndrii Nakryiko
Test all APIs for internal hashmap implementation. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24libbpf: add resizable non-thread safe internal hashmapAndrii Nakryiko
There is a need for fast point lookups inside libbpf for multiple use cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in BTF for upcoming BPF CO-RE relocation support, etc). This patch implements simple resizable non-thread safe hashmap using single linked list chains. Four different insert strategies are supported: - HASHMAP_ADD - only add key/value if key doesn't exist yet; - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise, update value; - HASHMAP_UPDATE - update value, if key already exists; otherwise, do nothing and return -ENOENT; - HASHMAP_APPEND - always add key/value pair, even if key already exists. This turns hashmap into a multimap by allowing multiple values to be associated with the same key. Most useful read API for such hashmap is hashmap__for_each_key_entry() iteration. If hashmap__find() is still used, it will return last inserted key/value entry (first in a bucket chain). For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so that calling code can handle proper memory management, if necessary. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24selftests/bpf: use btf__parse_elf to check presence of BTF/BTF.extAndrii Nakryiko
Switch test_btf.c to rely on btf__parse_elf to check presence of BTF and BTF.ext data, instead of implementing its own ELF parsing. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-24bpftool: use libbpf's btf__parse_elf APIAndrii Nakryiko
Use btf__parse_elf() API, provided by libbpf, instead of implementing ELF parsing by itself. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>