linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-02-13	bpf: Add BPF-helper for MTU checking	Jesper Dangaard Brouer
	This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs. The SKB object is complex and the skb->len value (accessible from BPF-prog) also include the length of any extra GRO/GSO segments, but without taking into account that these GRO/GSO segments get added transport (L4) and network (L3) headers before being transmitted. Thus, this BPF-helper is created such that the BPF-programmer don't need to handle these details in the BPF-prog. The API is designed to help the BPF-programmer, that want to do packet context size changes, which involves other helpers. These other helpers usually does a delta size adjustment. This helper also support a delta size (len_diff), which allow BPF-programmer to reuse arguments needed by these other helpers, and perform the MTU check prior to doing any actual size adjustment of the packet context. It is on purpose, that we allow the len adjustment to become a negative result, that will pass the MTU check. This might seem weird, but it's not this helpers responsibility to "catch" wrong len_diff adjustments. Other helpers will take care of these checks, if BPF-programmer chooses to do actual size adjustment. V14: - Improve man-page desc of len_diff. V13: - Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff. V12: - Simplify segment check that calls skb_gso_validate_network_len. - Helpers should return long V9: - Use dev->hard_header_len (instead of ETH_HLEN) - Annotate with unlikely req from Daniel - Fix logic error using skb_gso_validate_network_len from Daniel V6: - Took John's advice and dropped BPF_MTU_CHK_RELAX - Returned MTU is kept at L3-level (like fib_lookup) V4: Lot of changes - ifindex 0 now use current netdev for MTU lookup - rename helper from bpf_mtu_check to bpf_check_mtu - fix bug for GSO pkt length (as skb->len is total len) - remove __bpf_len_adj_positive, simply allow negative len adj Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
2021-02-12	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller
	Daniel Borkmann says: ==================== pull-request: bpf 2021-02-13 The following pull-request contains BPF updates for your net tree. We've added 2 non-merge commits during the last 3 day(s) which contain a total of 2 files changed, 9 insertions(+), 11 deletions(-). The main changes are: 1) Fix mod32 truncation handling in verifier, from Daniel Borkmann. 2) Fix XDP redirect tests to explicitly use bash, from Björn Töpel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	bpf: bpf_fib_lookup return MTU value as output when looked up	Jesper Dangaard Brouer
	The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup) can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog don't know the MTU value that caused this rejection. If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it need to know this MTU value for the ICMP packet. Patch change lookup and result struct bpf_fib_lookup, to contain this MTU value as output via a union with 'tot_len' as this is the value used for the MTU lookup. V5: - Fixed uninit value spotted by Dan Carpenter. - Name struct output member mtu_result Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287789952.790810.13134700381067698781.stgit@firesoul
2021-02-13	bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx	Jesper Dangaard Brouer
	BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use bpf_fib_lookup for doing MTU-check, but prior to extending packet size, by adjusting fib_params 'tot_len' with the packet length plus the expected encap size. (Just like the bpf_check_mtu helper supports). He discovered that for SKB ctx the param->tot_len was not used, instead skb->len was used (via MTU check in is_skb_forwardable() that checks against netdev MTU). Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g. zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU check is done like XDP code-path, which checks against FIB-dst MTU. V16: - Revert V13 optimization, 2nd lookup is against egress/resulting netdev V13: - Only do ifindex lookup one time, calling dev_get_by_index_rcu(). V10: - Use same method as XDP for 'tot_len' MTU check Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status") Reported-by: Carlo Carraro <colrack@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161287789444.790810.15247494756551413508.stgit@firesoul
2021-02-13	bpf: Remove MTU check in __bpf_skb_max_len	Jesper Dangaard Brouer
	Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul
2021-02-13	bpf: Fix truncation handling for mod32 dst reg wrt zero	Daniel Borkmann
	Recently noticed that when mod32 with a known src reg of 0 is performed, then the dst register is 32-bit truncated in verifier: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 0 1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=inv0 R1_w=inv-1 R10=fp0 2: (b4) w2 = -1 3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0 3: (9c) w1 %= w0 4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 4: (b7) r0 = 1 5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 5: (1d) if r1 == r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: (b7) r0 = 2 7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 7: (95) exit 7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0 7: (95) exit However, as a runtime result, we get 2 instead of 1, meaning the dst register does not contain (u32)-1 in this case. The reason is fairly straight forward given the 0 test leaves the dst register as-is: # ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+1 4: (9c) w1 %= w0 5: (b7) r0 = 1 6: (1d) if r1 == r2 goto pc+1 7: (b7) r0 = 2 8: (95) exit This was originally not an issue given the dst register was marked as completely unknown (aka 64 bit unknown). However, after 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification") the verifier casts the register output to 32 bit, and hence it becomes 32 bit unknown. Note that for the case where the src register is unknown, the dst register is marked 64 bit unknown. After the fix, the register is truncated by the runtime and the test passes: # ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+2 4: (9c) w1 %= w0 5: (05) goto pc+1 6: (bc) w1 = w1 7: (b7) r0 = 1 8: (1d) if r1 == r2 goto pc+1 9: (b7) r0 = 2 10: (95) exit Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows: mod32: mod64: (16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1 (9c) w1 %= w0 (9f) r1 %= r0 (05) goto pc+1 (bc) w1 = w1 Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-13	bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation	Jun'ichi Nomura
	The devmap bulk queue is allocated with GFP_ATOMIC and the allocation may fail if there is no available space in existing percpu pool. Since commit 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device") moved the bulk queue allocation to NETDEV_REGISTER callback, whose context is allowed to sleep, use GFP_KERNEL instead of GFP_ATOMIC to let percpu allocator extend the pool when needed and avoid possible failure of netdev registration. As the required alignment is natural, we can simply use alloc_percpu(). Fixes: 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device") Signed-off-by: Jun'ichi Nomura <junichi.nomura@nec.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210209082451.GA44021@jeru.linux.bs1.fc.nec.co.jp
2021-02-12	Merge tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernel	Linus Torvalds
	Pull cifs fixes from Steve French: "Four small smb3 fixes to the new mount API (including a particularly important one for DFS links). These were found in testing this week of additional DFS scenarios, and a user testing of an apache container problem" * tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernel: cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath. cifs: In the new mount api we get the full devname as source= cifs: do not disable noperm if multiuser mount option is not provided cifs: fix dfs-links
2021-02-12	f2fs: give a warning only for readonly partition	Jaegeuk Kim
	Let's allow mounting readonly partition. We're able to recovery later once we have it as read-write back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-12	bpf: Fix an unitialized value in bpf_iter	Yonghong Song
	Commit 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program") cached btf_id in struct bpf_iter_target_info so later on if it can be checked cheaply compared to checking registered names. syzbot found a bug that uninitialized value may occur to bpf_iter_target_info->btf_id. This is because we allocated bpf_iter_target_info structure with kmalloc and never initialized field btf_id afterwards. This uninitialized btf_id is typically compared to a u32 bpf program func proto btf_id, and the chance of being equal is extremely slim. This patch fixed the issue by using kzalloc which will also prevent future likely instances due to adding new fields. Fixes: 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program") Reported-by: syzbot+580f4f2a272e452d55cb@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210212005926.2875002-1-yhs@fb.com
2021-02-12	Merge branch 'hns3-cleanups'	David S. Miller
	Huazhong Tan says: ==================== net: hns3: some cleanups for -next To improve code readability and maintainability, the series refactor out some bloated functions in the HNS3 ethernet driver. change log: V2: remove an unused variable in #5 previous version: V1: https://patchwork.kernel.org/project/netdevbpf/cover/1612943005-59416-1-git-send-email-tanhuazhong@huawei.com/ ==================== Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-02-12	net: hns3: refactor out hclge_rm_vport_all_mac_table()	Hao Chen
	hclge_rm_vport_all_mac_table() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclgevf_set_rss_tuple()	Huazhong Tan
	To make it more readable and maintainable, split hclgevf_set_rss_tuple() into two parts. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclge_set_rss_tuple()	Huazhong Tan
	To make it more readable and maintainable, split hclge_set_rss_tuple() into two parts. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: split out hclgevf_cmd_send()	Yufeng Mo
	hclgevf_cmd_send() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: split out hclge_cmd_send()	Yufeng Mo
	hclge_cmd_send() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: split out hclge_dbg_dump_qos_buf_cfg()	Jian Shen
	hclge_dbg_dump_qos_buf_cfg() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclgevf_get_rss_tuple()	Jian Shen
	To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclgevf_get_rss_tuple(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclge_get_rss_tuple()	Jian Shen
	To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclge_get_rss_tuple(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclge_set_vf_vlan_common()	Peng Li
	To improve code readability and maintainability, separate the command handling part and the status parsing part from bloated hclge_set_vf_vlan_common(). Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: use ipv6_addr_any() helper	Jiaran Zhang
	Use common ipv6_addr_any() to determine if an addr is ipv6 any addr. Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: clean up hns3_dbg_cmd_write()	Peng Li
	As more commands are added, hns3_dbg_cmd_write() is going to get more bloated, so move the part about command check into a separate function. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclgevf_cmd_convert_err_code()	Peng Li
	To improve code readability and maintainability, refactor hclgevf_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclge_cmd_convert_err_code()	Peng Li
	To improve code readability and maintainability, refactor hclge_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	tools/resolve_btfids: Add /libbpf to .gitignore	Stanislav Fomichev
	This is what I see after compiling the kernel: # bpf-next...bpf-next/master ?? tools/bpf/resolve_btfids/libbpf/ Fixes: fc6b48f692f8 ("tools/resolve_btfids: Build libbpf and libsubcmd in separate directories") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212010053.668700-1-sdf@google.com
2021-02-12	io-wq: clear out worker ->fs and ->files	Jens Axboe
	By default, kernel threads have init_fs and init_files assigned. In the past, this has triggered security problems, as commands that don't ask for (and hence don't get assigned) fs/files from the originating task can then attempt path resolution etc with access to parts of the system they should not be able to. Rather than add checks in the fs code for misuse, just set these to NULL. If we do attempt to use them, then the resulting code will oops rather than provide access to something that it should not permit. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12	Merge branch 'introduce bpf_iter for task_vma'	Alexei Starovoitov
	Song Liu says: ==================== This set introduces bpf_iter for task_vma, which can be used to generate information similar to /proc/pid/maps. Patch 4/4 adds an example that mimics /proc/pid/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). Changes v6 => v7: 1. Let BPF iter program use bpf_d_path without specifying sleepable. (Alexei) Changes v5 => v6: 1. Add more comments for task_vma_seq_get_next() to explain the logic of find_vma() calls. (Alexei) 2. Skip vma found by find_vma() when both vm_start and vm_end matches prev_vm_[start\|end]. Previous versions only compares vm_start. IOW, if vma of [4k, 8k] is replaced by [4k, 12k] after relocking mmap_lock, v5 will skip the new vma, while v6 will process it. Changes v4 => v5: 1. Fix a refcount leak on task_struct. (Yonghong) 2. Fix the selftest. (Yonghong) Changes v3 => v4: 1. Avoid skipping vma by assigning invalid prev_vm_start in task_vma_seq_stop(). (Yonghong) 2. Move "again" label in task_vma_seq_get_next() save a check. (Yonghong) Changes v2 => v3: 1. Rewrite 1/4 so that we hold mmap_lock while calling BPF program. This enables the BPF program to access the real vma with BTF. (Alexei) 2. Fix the logic when the control is returned to user space. (Yonghong) 3. Revise commit log and cover letter. (Yonghong) Changes v1 => v2: 1. Small fixes in task_iter.c and the selftests. (Yonghong) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12	selftests/bpf: Add test for bpf_iter_task_vma	Song Liu
	The test dumps information similar to /proc/pid/maps. The first line of the output is compared against the /proc file to make sure they match. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-4-songliubraving@fb.com
2021-02-12	bpf: Allow bpf_d_path in bpf_iter program	Song Liu
	task_file and task_vma iter programs have access to file->f_path. Enable bpf_d_path to print paths of these file. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com
2021-02-12	bpf: Introduce task_vma bpf_iter	Song Liu
	Introduce task_vma bpf_iter to print memory information of a process. It can be used to print customized information similar to /proc/<pid>/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). To access the vma safely in the BPF program, task_vma iterator holds target mmap_lock while calling the BPF program. If the mmap_lock is contended, task_vma unlocks mmap_lock between iterations to unblock the writer(s). This lock contention avoidance mechanism is similar to the one used in show_smaps_rollup(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com
2021-02-12	bpf: selftests: Add non function pointer test to struct_ops	Martin KaFai Lau
	This patch adds a "void *owner" member. The existing bpf_tcp_ca test will ensure the bpf_cubic.o and bpf_dctcp.o can be loaded. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212021037.267278-1-kafai@fb.com
2021-02-12	libbpf: Ignore non function pointer member in struct_ops	Martin KaFai Lau
	When libbpf initializes the kernel's struct_ops in "bpf_map__init_kern_struct_ops()", it enforces all pointer types must be a function pointer and rejects others. It turns out to be too strict. For example, when directly using "struct tcp_congestion_ops" from vmlinux.h, it has a "struct module *owner" member and it is set to NULL in a bpf_tcp_cc.o. Instead, it only needs to ensure the member is a function pointer if it has been set (relocated) to a bpf-prog. This patch moves the "btf_is_func_proto(kern_mtype)" check after the existing "if (!prog) { continue; }". The original debug message in "if (!prog) { continue; }" is also removed since it is no longer valid. Beside, there is a later debug message to tell which function pointer is set. The "btf_is_func_proto(mtype)" has already been guaranteed in "bpf_object__collect_st_ops_relos()" which has been run before "bpf_map__init_kern_struct_ops()". Thus, this check is removed. v2: - Remove outdated debug message (Andrii) Remove because there is a later debug message to tell which function pointer is set. - Following mtype->type is no longer needed. Remove: "skip_mods_and_typedefs(btf, mtype->type, &mtype_id)" - Do "if (!prog)" test before skip_mods_and_typedefs. Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
2021-02-12	Merge tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull io_uring fix from Jens Axboe: "Revert of a patch from this release that caused a regression" * tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-block: Revert "io_uring: don't take fs for recvmsg/sendmsg"
2021-02-12	libbpf: Use AF_LOCAL instead of AF_INET in xsk.c	Stanislav Fomichev
	We have the environments where usage of AF_INET is prohibited (cgroup/sock_create returns EPERM for AF_INET). Let's use AF_LOCAL instead of AF_INET, it should perfectly work with SIOCETHTOOL. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210209221826.922940-1-sdf@google.com
2021-02-12	Merge tag 'drm-fixes-2021-02-12' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Regular fixes for final, there is a ttm regression fix, dp-mst fix, one amdgpu revert, two i915 fixes, and some misc fixes for sun4i, xlnx, and vc4. All pretty quiet and don't think we have any known outstanding regressions. ttm: - page pool regression fix. dp_mst: - don't report un-attached ports as connected amdgpu: - blank screen fix i915: - ensure Type-C FIA is powered when initializing - fix overlay frontbuffer tracking sun4i: - tcon1 sync polarity fix - always set HDMI clock rate - fix H6 HDMI PHY config - fix H6 max frequency vc4: - fix buffer overflow xlnx: - fix memory leak" * tag 'drm-fixes-2021-02-12' of git://anongit.freedesktop.org/drm/drm: drm/ttm: make sure pool pages are cleared drm/sun4i: dw-hdmi: Fix max. frequency for H6 drm/sun4i: Fix H6 HDMI PHY configuration drm/sun4i: dw-hdmi: always set clock rate drm/sun4i: tcon: set sync polarity for tcon1 channel drm/i915: Fix overlay frontbuffer tracking Revert "drm/amd/display: Update NV1x SR latency values" drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it drm/dp_mst: Don't report ports connected if nothing is attached to them drm/xlnx: fix kmemleak by sending vblank_event in atomic_disable drm/vc4: hvs: Fix buffer overflow with the dlist handling
2021-02-12	Merge tag 'trace-v5.11-rc7-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix buffer overflow in trace event filter. It was reported that if an trace event was larger than a page and was filtered, that it caused memory corruption. The reason is that filtered events first go into a buffer to test the filter before being written into the ring buffer. Unfortunately, this write did not check the size" * tag 'trace-v5.11-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Check length before giving out the filter buffer
2021-02-12	Merge tag 'for-linus-5.11-rc8-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A single fix for an issue introduced this development cycle: when running as a Xen guest on Arm systems the kernel will hang during boot" * tag 'for-linus-5.11-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: arm/xen: Don't probe xenbus as part of an early initcall
2021-02-12	Merge tag 'riscv-for-linus-5.11-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Palmer Dabbelt: "A single fix this week: the removal of the GPIO reset method for the Ethernet phy on the HiFive Unleashed. This returns to relying on the bootloader's phy reset sequence, which we'll have to continue doing until we can sort out how to get the Linux phy driver to perform the special reset dance required for this phy" * tag 'riscv-for-linus-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Revert "dts: phy: add GPIO number and active state used for phy reset"
2021-02-12	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix PTRACE_PEEKMTETAGS access to an mmapped region before the first write" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
2021-02-12	io_uring: optimise io_init_req() flags setting	Pavel Begunkov
	Invalid req->flags are tolerated by free/put well, avoid this dancing needlessly presetting it to zero, and then not even resetting but modifying it, i.e. "\|=". Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12	io_uring: clean io_req_find_next() fast check	Pavel Begunkov
	Indirectly io_req_find_next() is called for every request, optimise the check by testing flags as it was long before -- __io_req_find_next() tolerates false-positives well (i.e. link==NULL), and those should be really rare. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12	io_uring: don't check PF_EXITING from syscall	Pavel Begunkov
	io_sq_thread_acquire_mm_files() can find a PF_EXITING task only when it's called from task_work context. Don't check it in all other cases, that are when we're in io_uring_enter(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12	ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring	Maciej Fijalkowski
	Output of ixgbe_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to ixgbe_ring that is meant to hold the ixgbe_rx_offset() result and use it within ixgbe_clean_rx_irq(). Furthermore, use it within ixgbe_alloc_mapped_page(). Last but not least, un-inline the function of interest as it lives in .c file so let compiler do the decision about the inlining. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	ice: store the result of ice_rx_offset() onto ice_ring	Maciej Fijalkowski
	Output of ice_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to ice_ring that is meant to hold the ice_rx_offset() result and use it within ice_clean_rx_irq(). Furthermore, use it within ice_alloc_mapped_page(). Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	i40e: store the result of i40e_rx_offset() onto i40e_ring	Maciej Fijalkowski
	Output of i40e_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to i40e_ring that is meant to hold the i40e_rx_offset() result and use it within i40e_clean_rx_irq(). Furthermore, use it within i40e_alloc_mapped_page(). Last but not least, un-inline the function of interest so that compiler makes the decision about inlining as it lives in .c file. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	i40e: Simplify the do-while allocation loop	Björn Töpel
	Fold the count decrement into the while-statement. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	ice: skip NULL check against XDP prog in ZC path	Maciej Fijalkowski
	Whole zero-copy variant of clean Rx IRQ is executed when xsk_pool is attached to rx_ring and it can happen only when XDP program is present on interface. Therefore it is safe to assume that program is always !NULL and there is no need for checking it in ice_run_xdp_zc. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	ice: remove redundant checks in ice_change_mtu	Maciej Fijalkowski
	dev_validate_mtu checks that mtu value specified by user is not less than min mtu and not greater than max allowed mtu. It is being done before calling the ndo_change_mtu exposed by driver, so remove these redundant checks in ice_change_mtu. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	ice: move skb pointer from rx_buf to rx_ring	Maciej Fijalkowski
	Similar thing has been done in i40e, as there is no real need for having the sk_buff pointer in each rx_buf. Non-eop frames can be simply handled on that pointer moved upwards to rx_ring. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12	ice: simplify ice_run_xdp	Maciej Fijalkowski
	There's no need for 'result' variable, we can directly return the internal status based on action returned by xdp prog. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>