summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-13bpf: Fix bpf_fib_lookup helper MTU check for SKB ctxJesper Dangaard Brouer
BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size, by adjusting fib_params 'tot_len' with the packet length plus the expected encap size. (Just like the bpf_check_mtu helper supports). He discovered that for SKB ctx the param->tot_len was not used, instead skb->len was used (via MTU check in is_skb_forwardable() that checks against netdev MTU). Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g. zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU check is done like XDP code-path, which checks against FIB-dst MTU. V16: - Revert V13 optimization, 2nd lookup is against egress/resulting netdev V13: - Only do ifindex lookup one time, calling dev_get_by_index_rcu(). V10: - Use same method as XDP for 'tot_len' MTU check Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status") Reported-by: Carlo Carraro <colrack@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161287789444.790810.15247494756551413508.stgit@firesoul
2021-02-13bpf: Remove MTU check in __bpf_skb_max_lenJesper Dangaard Brouer
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul
2021-02-13bpf: Fix truncation handling for mod32 dst reg wrt zeroDaniel Borkmann
Recently noticed that when mod32 with a known src reg of 0 is performed, then the dst register is 32-bit truncated in verifier: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 0 1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=inv0 R1_w=inv-1 R10=fp0 2: (b4) w2 = -1 3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0 3: (9c) w1 %= w0 4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 4: (b7) r0 = 1 5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 5: (1d) if r1 == r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: (b7) r0 = 2 7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 7: (95) exit 7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0 7: (95) exit However, as a runtime result, we get 2 instead of 1, meaning the dst register does not contain (u32)-1 in this case. The reason is fairly straight forward given the 0 test leaves the dst register as-is: # ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+1 4: (9c) w1 %= w0 5: (b7) r0 = 1 6: (1d) if r1 == r2 goto pc+1 7: (b7) r0 = 2 8: (95) exit This was originally not an issue given the dst register was marked as completely unknown (aka 64 bit unknown). However, after 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification") the verifier casts the register output to 32 bit, and hence it becomes 32 bit unknown. Note that for the case where the src register is unknown, the dst register is marked 64 bit unknown. After the fix, the register is truncated by the runtime and the test passes: # ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+2 4: (9c) w1 %= w0 5: (05) goto pc+1 6: (bc) w1 = w1 7: (b7) r0 = 1 8: (1d) if r1 == r2 goto pc+1 9: (b7) r0 = 2 10: (95) exit Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows: mod32: mod64: (16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1 (9c) w1 %= w0 (9f) r1 %= r0 (05) goto pc+1 (bc) w1 = w1 Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-13bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocationJun'ichi Nomura
The devmap bulk queue is allocated with GFP_ATOMIC and the allocation may fail if there is no available space in existing percpu pool. Since commit 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device") moved the bulk queue allocation to NETDEV_REGISTER callback, whose context is allowed to sleep, use GFP_KERNEL instead of GFP_ATOMIC to let percpu allocator extend the pool when needed and avoid possible failure of netdev registration. As the required alignment is natural, we can simply use alloc_percpu(). Fixes: 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device") Signed-off-by: Jun'ichi Nomura <junichi.nomura@nec.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210209082451.GA44021@jeru.linux.bs1.fc.nec.co.jp
2021-02-12Merge tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernelLinus Torvalds
Pull cifs fixes from Steve French: "Four small smb3 fixes to the new mount API (including a particularly important one for DFS links). These were found in testing this week of additional DFS scenarios, and a user testing of an apache container problem" * tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernel: cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath. cifs: In the new mount api we get the full devname as source= cifs: do not disable noperm if multiuser mount option is not provided cifs: fix dfs-links
2021-02-12f2fs: give a warning only for readonly partitionJaegeuk Kim
Let's allow mounting readonly partition. We're able to recovery later once we have it as read-write back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-12perf probe: Fix kretprobe issue caused by GCC bugJianlin Lv
Perf failed to add a kretprobe event with debuginfo of vmlinux which is compiled by gcc with -fpatchable-function-entry option enabled. The same issue with kernel module. Issue: # perf probe -v 'kernel_clone%return $retval' ...... Writing event: r:probe/kernel_clone__return _text+599624 $retval Failed to write event: Invalid argument Error: Failed to add events. Reason: Invalid argument (Code: -22) # cat /sys/kernel/debug/tracing/error_log [156.75] trace_kprobe: error: Retprobe address must be an function entry Command: r:probe/kernel_clone__return _text+599624 $retval ^ # llvm-dwarfdump vmlinux |grep -A 10 -w 0x00df2c2b 0x00df2c2b: DW_TAG_subprogram DW_AT_external (true) DW_AT_name ("kernel_clone") DW_AT_decl_file ("/home/code/linux-next/kernel/fork.c") DW_AT_decl_line (2423) DW_AT_decl_column (0x07) DW_AT_prototyped (true) DW_AT_type (0x00dcd492 "pid_t") DW_AT_low_pc (0xffff800010092648) DW_AT_high_pc (0xffff800010092b9c) DW_AT_frame_base (DW_OP_call_frame_cfa) # cat /proc/kallsyms |grep kernel_clone ffff800010092640 T kernel_clone # readelf -s vmlinux |grep -i kernel_clone 183173: ffff800010092640 1372 FUNC GLOBAL DEFAULT 2 kernel_clone # objdump -d vmlinux |grep -A 10 -w \<kernel_clone\>: ffff800010092640 <kernel_clone>: ffff800010092640: d503201f nop ffff800010092644: d503201f nop ffff800010092648: d503233f paciasp ffff80001009264c: a9b87bfd stp x29, x30, [sp, #-128]! ffff800010092650: 910003fd mov x29, sp ffff800010092654: a90153f3 stp x19, x20, [sp, #16] The entry address of kernel_clone converted by debuginfo is _text+599624 (0x92648), which is consistent with the value of DW_AT_low_pc attribute. But the symbolic address of kernel_clone from /proc/kallsyms is ffff800010092640. This issue is found on arm64, -fpatchable-function-entry=2 is enabled when CONFIG_DYNAMIC_FTRACE_WITH_REGS=y; Just as objdump displayed the assembler contents of kernel_clone, GCC generate 2 NOPs at the beginning of each function. kprobe_on_func_entry detects that (_text+599624) is not the entry address of the function, which leads to the failure of adding kretprobe event. kprobe_on_func_entry ->_kprobe_addr ->kallsyms_lookup_size_offset ->arch_kprobe_on_func_entry // FALSE The cause of the issue is that the first instruction in the compile unit indicated by DW_AT_low_pc does not include NOPs. This issue exists in all gcc versions that support -fpatchable-function-entry option. I have reported it to the GCC community: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 Currently arm64 and PA-RISC may enable fpatchable-function-entry option. The kernel compiled with clang does not have this issue. FIX: This GCC issue only cause the registration failure of the kretprobe event which doesn't need debuginfo. So, stop using debuginfo for retprobe. map will be used to query the probe function address. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: http://lore.kernel.org/lkml/20210210062646.2377995-1-Jianlin.Lv@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12bpf: Fix an unitialized value in bpf_iterYonghong Song
Commit 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program") cached btf_id in struct bpf_iter_target_info so later on if it can be checked cheaply compared to checking registered names. syzbot found a bug that uninitialized value may occur to bpf_iter_target_info->btf_id. This is because we allocated bpf_iter_target_info structure with kmalloc and never initialized field btf_id afterwards. This uninitialized btf_id is typically compared to a u32 bpf program func proto btf_id, and the chance of being equal is extremely slim. This patch fixed the issue by using kzalloc which will also prevent future likely instances due to adding new fields. Fixes: 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program") Reported-by: syzbot+580f4f2a272e452d55cb@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210212005926.2875002-1-yhs@fb.com
2021-02-12dt-bindings: mtd: add binding for BCM4908 partitionsRafał Miłecki
BCM4908 uses fixed partitions layout but function of some partitions may vary. Some devices use multiple firmware partitions and those partitions should be marked to let system discover their purpose. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12dt-bindings: mtd: move partition binding to its own fileRafał Miłecki
Single partition binding is quite common and may be: 1. Used by multiple parsers 2. Extended for more specific cases Move it to separated file to avoid code duplication. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12perf symbols: Fix return value when loading PE DSONicholas Fraser
The first time dso__load() was called on a PE file it always returned -1 error. This caused the first call to map__find_symbol() to always fail on a PE file so the first sample from each PE file always had symbol <unknown>. Subsequent samples succeed however because the DSO is already loaded. This fixes dso__load() to return 0 when successfully loading a DSO with libbfd. Fixes: eac9a4342e5447ca ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/1671b43b-09c3-1911-dbf8-7f030242fbf7@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12perf symbols: Make dso__load_bfd_symbols() load PE files from debug cache onlyNicholas Fraser
dso__load_bfd_symbols() attempts to load a DSO at its original path, then closes it and loads the file in the debug cache. This is incorrect. It should ignore the original file and work with only the debug cache. The original file may have changed or may not even exist, for example if the debug cache has been transferred to another machine via "perf archive". This fix makes it only load the file in the debug cache. Further notes from Nicholas: dso__load_bfd_symbols() is called in a loop from dso__load() for a variety of paths. These are generated by the various DSO_BINARY_TYPEs in the binary_type_symtab list at the top of util/symbol.c. In each case the debugfile passed to dso__load_bfd_symbols() is the path to try. One of those iterations (the first one I believe) passes the original path as the debugfile. If the file still exists at the original path, this is the one that ends up being used in case the debugcache was deleted or the PE file doesn't have a build-id. A later iteration (BUILD_ID_CACHE) passes debugfile as the file in the debugcache if it has a build-id. Even if the file was previously loaded at its original path, (if I understand correctly) this load will override it so the debugcache file ends up being used. Committer notes: So if it fails to find in the cache, it will eventually hope for the best and look at the path in the local filesystem, which in many cases is enough. At some point we need to switch from this "hope for the best" approach to one that warns the user that there is no guarantee, if no buildid is present, that just by looking at the pathname the symbolisation will work. Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/e58e1237-94ab-e1c9-a7b9-473531906954@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12Merge branch 'hns3-cleanups'David S. Miller
Huazhong Tan says: ==================== net: hns3: some cleanups for -next To improve code readability and maintainability, the series refactor out some bloated functions in the HNS3 ethernet driver. change log: V2: remove an unused variable in #5 previous version: V1: https://patchwork.kernel.org/project/netdevbpf/cover/1612943005-59416-1-git-send-email-tanhuazhong@huawei.com/ ==================== Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-02-12net: hns3: refactor out hclge_rm_vport_all_mac_table()Hao Chen
hclge_rm_vport_all_mac_table() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclgevf_set_rss_tuple()Huazhong Tan
To make it more readable and maintainable, split hclgevf_set_rss_tuple() into two parts. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_set_rss_tuple()Huazhong Tan
To make it more readable and maintainable, split hclge_set_rss_tuple() into two parts. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: split out hclgevf_cmd_send()Yufeng Mo
hclgevf_cmd_send() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: split out hclge_cmd_send()Yufeng Mo
hclge_cmd_send() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: split out hclge_dbg_dump_qos_buf_cfg()Jian Shen
hclge_dbg_dump_qos_buf_cfg() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclgevf_get_rss_tuple()Jian Shen
To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclgevf_get_rss_tuple(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_get_rss_tuple()Jian Shen
To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclge_get_rss_tuple(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_set_vf_vlan_common()Peng Li
To improve code readability and maintainability, separate the command handling part and the status parsing part from bloated hclge_set_vf_vlan_common(). Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: use ipv6_addr_any() helperJiaran Zhang
Use common ipv6_addr_any() to determine if an addr is ipv6 any addr. Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: clean up hns3_dbg_cmd_write()Peng Li
As more commands are added, hns3_dbg_cmd_write() is going to get more bloated, so move the part about command check into a separate function. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclgevf_cmd_convert_err_code()Peng Li
To improve code readability and maintainability, refactor hclgevf_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_cmd_convert_err_code()Peng Li
To improve code readability and maintainability, refactor hclge_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12tools/resolve_btfids: Add /libbpf to .gitignoreStanislav Fomichev
This is what I see after compiling the kernel: # bpf-next...bpf-next/master ?? tools/bpf/resolve_btfids/libbpf/ Fixes: fc6b48f692f8 ("tools/resolve_btfids: Build libbpf and libsubcmd in separate directories") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212010053.668700-1-sdf@google.com
2021-02-12clk: socfpga: agilex: add clock driver for eASIC N5X platformDinh Nguyen
Add support for Intel's eASIC N5X platform. The clock manager driver for the N5X is very similar to the Agilex platform, we can re-use most of the Agilex clock driver. This patch makes the necessary changes for the driver to differentiate between the Agilex and the N5X platforms. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lore.kernel.org/r/20210212143059.478554-2-dinguyen@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-02-12dt-bindings: documentation: add clock bindings information for eASIC N5XDinh Nguyen
Document the Agilex clock bindings, and add the clock header file. The clock header is an enumeration of all the different clocks on the eASIC N5X platform. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lore.kernel.org/r/20210212143059.478554-1-dinguyen@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-02-12io-wq: clear out worker ->fs and ->filesJens Axboe
By default, kernel threads have init_fs and init_files assigned. In the past, this has triggered security problems, as commands that don't ask for (and hence don't get assigned) fs/files from the originating task can then attempt path resolution etc with access to parts of the system they should not be able to. Rather than add checks in the fs code for misuse, just set these to NULL. If we do attempt to use them, then the resulting code will oops rather than provide access to something that it should not permit. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12Merge branch 'introduce bpf_iter for task_vma'Alexei Starovoitov
Song Liu says: ==================== This set introduces bpf_iter for task_vma, which can be used to generate information similar to /proc/pid/maps. Patch 4/4 adds an example that mimics /proc/pid/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). Changes v6 => v7: 1. Let BPF iter program use bpf_d_path without specifying sleepable. (Alexei) Changes v5 => v6: 1. Add more comments for task_vma_seq_get_next() to explain the logic of find_vma() calls. (Alexei) 2. Skip vma found by find_vma() when both vm_start and vm_end matches prev_vm_[start|end]. Previous versions only compares vm_start. IOW, if vma of [4k, 8k] is replaced by [4k, 12k] after relocking mmap_lock, v5 will skip the new vma, while v6 will process it. Changes v4 => v5: 1. Fix a refcount leak on task_struct. (Yonghong) 2. Fix the selftest. (Yonghong) Changes v3 => v4: 1. Avoid skipping vma by assigning invalid prev_vm_start in task_vma_seq_stop(). (Yonghong) 2. Move "again" label in task_vma_seq_get_next() save a check. (Yonghong) Changes v2 => v3: 1. Rewrite 1/4 so that we hold mmap_lock while calling BPF program. This enables the BPF program to access the real vma with BTF. (Alexei) 2. Fix the logic when the control is returned to user space. (Yonghong) 3. Revise commit log and cover letter. (Yonghong) Changes v1 => v2: 1. Small fixes in task_iter.c and the selftests. (Yonghong) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12selftests/bpf: Add test for bpf_iter_task_vmaSong Liu
The test dumps information similar to /proc/pid/maps. The first line of the output is compared against the /proc file to make sure they match. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-4-songliubraving@fb.com
2021-02-12bpf: Allow bpf_d_path in bpf_iter programSong Liu
task_file and task_vma iter programs have access to file->f_path. Enable bpf_d_path to print paths of these file. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com
2021-02-12bpf: Introduce task_vma bpf_iterSong Liu
Introduce task_vma bpf_iter to print memory information of a process. It can be used to print customized information similar to /proc/<pid>/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). To access the vma safely in the BPF program, task_vma iterator holds target mmap_lock while calling the BPF program. If the mmap_lock is contended, task_vma unlocks mmap_lock between iterations to unblock the writer(s). This lock contention avoidance mechanism is similar to the one used in show_smaps_rollup(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com
2021-02-12jffs2: check the validity of dstlen in jffs2_zlib_compress()Yang Yang
KASAN reports a BUG when download file in jffs2 filesystem.It is because when dstlen == 1, cpage_out will write array out of bounds. Actually, data will not be compressed in jffs2_zlib_compress() if data's length less than 4. [ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281 [ 393.809166] Write of size 1 by task tftp/2918 [ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1 [ 393.823173] Hardware name: LS1043A RDB Board (DT) [ 393.827870] Call trace: [ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0 [ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20 [ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0 [ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80 [ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8 [ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40 [ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58 [ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0 [ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0 [ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478 [ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450 [ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0 [ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280 [ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228 [ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288 [ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238 [ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238 [ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110 [ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4 [ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64 [ 393.944197] Allocated: [ 393.946552] PID = 2918 [ 393.948913] save_stack_trace_tsk+0x0/0x220 [ 393.953096] save_stack_trace+0x18/0x20 [ 393.956932] kasan_kmalloc+0xd8/0x188 [ 393.960594] __kmalloc+0x144/0x238 [ 393.963994] jffs2_selected_compress+0x48/0x2a0 [ 393.968524] jffs2_compress+0x58/0x478 [ 393.972273] jffs2_write_inode_range+0x13c/0x450 [ 393.976889] jffs2_write_end+0x2a8/0x4a0 [ 393.980810] generic_perform_write+0x1c0/0x280 [ 393.985251] __generic_file_write_iter+0x1c4/0x228 [ 393.990040] generic_file_write_iter+0x138/0x288 [ 393.994655] __vfs_write+0x1b4/0x238 [ 393.998228] vfs_write+0xd0/0x238 [ 394.001543] SyS_write+0xa0/0x110 [ 394.004856] __sys_trace_return+0x0/0x4 [ 394.008684] Freed: [ 394.010691] PID = 2918 [ 394.013051] save_stack_trace_tsk+0x0/0x220 [ 394.017233] save_stack_trace+0x18/0x20 [ 394.021069] kasan_slab_free+0x88/0x188 [ 394.024902] kfree+0x6c/0x1d8 [ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880 [ 394.032486] jffs2_do_reserve_space+0x198/0x598 [ 394.037016] jffs2_reserve_space+0x3f8/0x4d8 [ 394.041286] jffs2_write_inode_range+0xf0/0x450 [ 394.045816] jffs2_write_end+0x2a8/0x4a0 [ 394.049737] generic_perform_write+0x1c0/0x280 [ 394.054179] __generic_file_write_iter+0x1c4/0x228 [ 394.058968] generic_file_write_iter+0x138/0x288 [ 394.063583] __vfs_write+0x1b4/0x238 [ 394.067157] vfs_write+0xd0/0x238 [ 394.070470] SyS_write+0xa0/0x110 [ 394.073783] __sys_trace_return+0x0/0x4 [ 394.077612] Memory state around the buggy address: [ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 394.104056] ^ [ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.121718] ================================================================== Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: Fix off-by-one errorSascha Hauer
An inode is allowed to have ubifs_xattr_max_cnt() xattrs, so we must complain only when an inode has more xattrs, having exactly ubifs_xattr_max_cnt() xattrs is fine. With this the maximum number of xattrs can be created without hitting the "has too many xattrs" warning when removing it. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: replay: Fix high stack usage, againArnd Bergmann
An earlier commit moved out some functions to not be inlined by gcc, but after some other rework to remove one of those, clang started inlining the other one and ran into the same problem as gcc did before: fs/ubifs/replay.c:1174:5: error: stack frame size of 1152 bytes in function 'ubifs_replay_journal' [-Werror,-Wframe-larger-than=] Mark the function as noinline_for_stack to ensure it doesn't happen again. Fixes: f80df3851246 ("ubifs: use crypto_shash_tfm_digest()") Fixes: eb66eff6636d ("ubifs: replay: Fix high stack usage") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: Fix memleak in ubifs_init_authenticationDinghao Liu
When crypto_shash_digestsize() fails, c->hmac_tfm has not been freed before returning, which leads to memleak. Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12jffs2: fix use after free in jffs2_sum_write_data()Tom Rix
clang static analysis reports this problem fs/jffs2/summary.c:794:31: warning: Use of memory after it is freed c->summary->sum_list_head = temp->u.next; ^~~~~~~~~~~~ In jffs2_sum_write_data(), in a loop summary data is handles a node at a time. When it has written out the node it is removed the summary list, and the node is deleted. In the corner case when a JFFS2_FEATURE_RWCOMPAT_COPY is seen, a call is made to jffs2_sum_disable_collecting(). jffs2_sum_disable_collecting() deletes the whole list which conflicts with the loop's deleting the list by parts. To preserve the old behavior of stopping the write midway, bail out of the loop after disabling summary collection. Fixes: 6171586a7ae5 ("[JFFS2] Correct handling of JFFS2_FEATURE_RWCOMPAT_COPY nodes.") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubi: eba: Delete useless kfree codeZheng Yongjun
The parameter of kfree function is NULL, so kfree code is useless, delete it. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubi: remove dead code in validate_vid_hdr()Jubin Zhong
data_size is already checked against zero when vol_type matches UBI_VID_STATIC. Remove the following dead code. Signed-off-by: Jubin Zhong <zhongjubin@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12perf arm-spe: Store operation type in packetLeo Yan
This patch is to store operation type in packet structure. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12perf arm-spe: Store memory address in packetLeo Yan
This patch is to store virtual and physical memory addresses in packet, which will be used for memory samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12um: irq.h: include <asm-generic/irq.h>Johannes Berg
This will get the (no-op) definition of irq_canonicalize() which some code might want. We could define that ourselves, but it seems like we'd likely want generic extensions in the future, if any. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: io.h: include <linux/types.h>Johannes Berg
This may be needed for size_t if something doesn't get it included elsewhere before including <asm/io.h>, so add the include. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: add a pseudo RTCJohannes Berg
Add a pseudo RTC that simply is able to send an alarm signal waking up the system at a given time in the future. Since apparently timerfd_create() FDs don't support SIGIO, we use the sigio-creating helper thread, which just learned to do suspend/resume properly in the previous patch. For time-travel mode, OTOH, just add an event at the specified time in the future, and that's already sufficient to wake up the system at that point in time since suspend will just be in an "endless wait". For s2idle support also call pm_system_wakeup(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12perf arm-spe: Enable sample type PERF_SAMPLE_DATA_SRCLeo Yan
This patch is to enable sample type PERF_SAMPLE_DATA_SRC for Arm SPE in the perf data, when output the tracing data, it tells tools that it contains data source in the memory event. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-1-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12um: remove process stub VMAJohannes Berg
This mostly reverts the old commit 3963333fe676 ("uml: cover stubs with a VMA") which had added a VMA to the existing PTEs. However, there's no real reason to have the PTEs in the first place and the VMA cannot be 'fixed' in place, which leads to bugs that userspace could try to unmap them and be forcefully killed, or such. Also, there's a bit of an ugly hole in userspace's address space. Simplify all this: just install the stub code/page at the top of the (inner) address space, i.e. put it just above TASK_SIZE. The pages are simply hard-coded to be mapped in the userspace process we use to implement an mm context, and they're out of reach of the inner mmap/munmap/mprotect etc. since they're above TASK_SIZE. Getting rid of the VMA also makes vma_merge() no longer hit one of the VM_WARN_ON()s there because we installed a VMA while the code assumes the stack VMA is the first one. It also removes a lockdep warning about mmap_sem usage since we no longer have uml_setup_stubs() and thus no longer need to do any manipulation that would require mmap_sem in activate_mm(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12perf env: Remove unneeded internal/cpumap inclusionsIan Rogers
Minor cleanup. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210211183914.4093187-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12um: rework userspace stubs to not hard-code stub locationJohannes Berg
The userspace stacks mostly have a stack (and in the case of the syscall stub we can just set their stack pointer) that points to the location of the stub data page already. Rework the stubs to use the stack pointer to derive the start of the data page, rather than requiring it to be hard-coded. In the clone stub, also integrate the int3 into the stack remap, since we really must not use the stack while we remap it. This prepares for putting the stub at a variable location that's not part of the normal address space of the userspace processes running inside the UML machine. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>