summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-04-20net: Change skb_ensure_writable()'s write_len param to unsigned int typeLiu Jian
Both pskb_may_pull() and skb_clone_writable()'s length parameters are of type unsigned int already. Therefore, change this function's write_len param to unsigned int type. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220416105801.88708-3-liujian56@huawei.com
2022-04-20bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytesLiu Jian
The data length of skb frags + frag_list may be greater than 0xffff, and skb_header_pointer can not handle negative offset. So, here INT_MAX is used to check the validity of offset. Add the same change to the related function skb_store_bytes. Fixes: 05c74e5e53f6 ("bpf: add bpf_skb_load_bytes helper") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com
2022-04-19bpf: Fix usage of trace RCU in local storage.KP Singh
bpf_{sk,task,inode}_storage_free() do not need to use call_rcu_tasks_trace as no BPF program should be accessing the owner as it's being destroyed. The only other reader at this point is bpf_local_storage_map_free() which uses normal RCU. The only path that needs trace RCU are: * bpf_local_storage_{delete,update} helpers * map_{delete,update}_elem() syscalls Fixes: 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs") Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org
2022-04-15xsk: Drop ternary operator from xskq_cons_has_entriesMaciej Fijalkowski
Simplify the mentioned helper function by removing ternary operator. The expression that is there outputs the boolean value by itself. This helper might be used in the hot path so this simplification can also be considered as micro optimization. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-15-maciej.fijalkowski@intel.com
2022-04-15xsk: Diversify return codes in xsk_rcv_check()Maciej Fijalkowski
Inspired by patch that made xdp_do_redirect() return values for XSKMAP more meaningful, return -ENXIO instead of -EINVAL for socket being unbound in xsk_rcv_check() as this is the usual value that is returned for such event. In turn, it is now possible to easily distinguish what went wrong, which is a bit harder when for both cases checked, -EINVAL was returned. Return codes can be counted in a nice way via bpftrace oneliner that Jesper has shown: bpftrace -e 'tracepoint:xdp:xdp_redirect* {@err[-args->err] = count();}' Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220413153015.453864-3-maciej.fijalkowski@intel.com
2022-04-15xsk: Improve xdp_do_redirect() error codesBjörn Töpel
The error codes returned by xdp_do_redirect() when redirecting a frame to an AF_XDP socket has not been very useful. A driver could not distinguish between different errors. Prior this change the following codes where used: Socket not bound or incorrect queue/netdev: EINVAL XDP frame/AF_XDP buffer size mismatch: ENOSPC Could not allocate buffer (copy mode): ENOSPC AF_XDP Rx buffer full: ENOSPC After this change: Socket not bound or incorrect queue/netdev: EINVAL XDP frame/AF_XDP buffer size mismatch: ENOSPC Could not allocate buffer (copy mode): ENOMEM AF_XDP Rx buffer full: ENOBUFS An AF_XDP zero-copy driver can now potentially determine if the failure was due to a full Rx buffer, and if so stop processing more frames, yielding to the userland AF_XDP application. Signed-off-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220413153015.453864-2-maciej.fijalkowski@intel.com
2022-04-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-04-09 We've added 63 non-merge commits during the last 9 day(s) which contain a total of 68 files changed, 4852 insertions(+), 619 deletions(-). The main changes are: 1) Add libbpf support for USDT (User Statically-Defined Tracing) probes. USDTs are an abstraction built on top of uprobes, critical for tracing and BPF, and widely used in production applications, from Andrii Nakryiko. 2) While Andrii was adding support for x86{-64}-specific logic of parsing USDT argument specification, Ilya followed-up with USDT support for s390 architecture, from Ilya Leoshkevich. 3) Support name-based attaching for uprobe BPF programs in libbpf. The format supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g. attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc") now, from Alan Maguire. 4) Various load/store optimizations for the arm64 JIT to shrink the image size by using arm64 str/ldr immediate instructions. Also enable pointer authentication to verify return address for JITed code, from Xu Kuohai. 5) BPF verifier fixes for write access checks to helper functions, e.g. rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that write into passed buffers, from Kumar Kartikeya Dwivedi. 6) Fix overly excessive stack map allocation for its base map structure and buckets which slipped-in from cleanups during the rlimit accounting removal back then, from Yuntao Wang. 7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter connection tracking tuple direction, from Lorenzo Bianconi. 8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde. 9) Minor cleanups all over the place from various others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits) bpf: Fix excessive memory allocation in stack_map_alloc() selftests/bpf: Fix return value checks in perf_event_stackmap test selftests/bpf: Add CO-RE relos into linked_funcs selftests libbpf: Use weak hidden modifier for USDT BPF-side API functions libbpf: Don't error out on CO-RE relos for overriden weak subprogs samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread libbpf: Allow WEAK and GLOBAL bindings during BTF fixup libbpf: Use strlcpy() in path resolution fallback logic libbpf: Add s390-specific USDT arg spec parsing logic libbpf: Make BPF-side of USDT support work on big-endian machines libbpf: Minor style improvements in USDT code libbpf: Fix use #ifdef instead of #if to avoid compiler warning libbpf: Potential NULL dereference in usdt_manager_attach_usdt() selftests/bpf: Uprobe tests should verify param/return values libbpf: Improve string parsing for uprobe auto-attach libbpf: Improve library identification for uprobe binary path resolution selftests/bpf: Test for writes to map key from BPF helpers selftests/bpf: Test passing rdonly mem to global func bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access ... ==================== Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08net/sched: flower: Avoid overwriting error messagesIdo Schimmel
The various error paths of tc_setup_offload_action() now report specific error messages. Remove the generic messages to avoid overwriting the more specific ones. Before: # tc filter add dev dummy0 ingress pref 1 proto ip flower skip_sw dst_ip 198.51.100.1 action police rate 100Mbit burst 10000 Error: cls_flower: Failed to setup flow action. We have an error talking to the kernel After: # tc filter add dev dummy0 ingress pref 1 proto ip flower skip_sw dst_ip 198.51.100.1 action police rate 100Mbit burst 10000 Error: act_police: Offload not supported when conform/exceed action is "reclassify". We have an error talking to the kernel Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: matchall: Avoid overwriting error messagesIdo Schimmel
The various error paths of tc_setup_offload_action() now report specific error messages. Remove the generic messages to avoid overwriting the more specific ones. Before: # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000 Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel After: # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000 Error: act_police: Offload not supported when conform/exceed action is "reclassify". We have an error talking to the kernel Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: cls_api: Add extack message for unsupported action offloadIdo Schimmel
For better error reporting to user space, add an extack message when the requested action does not support offload. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action nat ingress 192.0.2.1 198.51.100.1 Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-181 [000] b..1. 88.406093: netlink_extack: msg=Action does not support offload tc-181 [000] ..... 88.406108: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_vlan: Add extack message for offload failureIdo Schimmel
For better error reporting to user space, add an extack message when vlan action offload fails. Currently, the failure cannot be triggered, but add a message in case the action is extended in the future to support more than the current set of modes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_tunnel_key: Add extack message for offload failureIdo Schimmel
For better error reporting to user space, add an extack message when tunnel_key action offload fails. Currently, the failure cannot be triggered, but add a message in case the action is extended in the future to support more than set/release modes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_skbedit: Add extack messages for offload failureIdo Schimmel
For better error reporting to user space, add extack messages when skbedit action offload fails. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit queue_mapping 1234 Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-185 [002] b..1. 31.802414: netlink_extack: msg=act_skbedit: Offload not supported when "queue_mapping" option is used tc-185 [002] ..... 31.802418: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit inheritdsfield Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-187 [002] b..1. 45.985145: netlink_extack: msg=act_skbedit: Offload not supported when "inheritdsfield" option is used tc-187 [002] ..... 45.985160: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_police: Add extack messages for offload failureIdo Schimmel
For better error reporting to user space, add extack messages when police action offload fails. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000 Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-182 [000] b..1. 21.592969: netlink_extack: msg=act_police: Offload not supported when conform/exceed action is "reclassify" tc-182 [000] ..... 21.592982: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000 conform-exceed drop/continue Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-184 [000] b..1. 38.882579: netlink_extack: msg=act_police: Offload not supported when conform/exceed action is "continue" tc-184 [000] ..... 38.882593: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_pedit: Add extack message for offload failureIdo Schimmel
For better error reporting to user space, add an extack message when pedit action offload fails. Currently, the failure cannot be triggered, but add a message in case the action is extended in the future to support more than set/add commands. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_mpls: Add extack messages for offload failureIdo Schimmel
For better error reporting to user space, add extack messages when mpls action offload fails. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action mpls dec_ttl Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-182 [000] b..1. 18.693915: netlink_extack: msg=act_mpls: Offload not supported when "dec_ttl" option is used tc-182 [000] ..... 18.693921: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_mirred: Add extack message for offload failureIdo Schimmel
For better error reporting to user space, add an extack message when mirred action offload fails. Currently, the failure cannot be triggered, but add a message in case the action is extended in the future to support more than ingress/egress mirror/redirect. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_gact: Add extack messages for offload failureIdo Schimmel
For better error reporting to user space, add extack messages when gact action offload fails. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action continue Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-181 [002] b..1. 105.493450: netlink_extack: msg=act_gact: Offload of "continue" action is not supported tc-181 [002] ..... 105.493466: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action reclassify Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-183 [002] b..1. 124.126477: netlink_extack: msg=act_gact: Offload of "reclassify" action is not supported tc-183 [002] ..... 124.126489: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action pipe action drop Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-185 [002] b..1. 137.097791: netlink_extack: msg=act_gact: Offload of "pipe" action is not supported tc-185 [002] ..... 137.097804: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_api: Add extack to offload_act_setup() callbackIdo Schimmel
The callback is used by various actions to populate the flow action structure prior to offload. Pass extack to this callback so that the various actions will be able to report accurate error messages to user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: flower: Take verbose flag into account when logging error messagesIdo Schimmel
The verbose flag was added in commit 81c7288b170a ("sched: cls: enable verbose logging") to avoid suppressing logging of error messages that occur "when the rule is not to be exclusively executed by the hardware". However, such error messages are currently suppressed when setup of flow action fails. Take the verbose flag into account to avoid suppressing error messages. This is done by using the extack pointer initialized by tc_cls_common_offload_init(), which performs the necessary checks. Before: # tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000 # tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000 After: # tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000 # tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000 Warning: cls_flower: Failed to setup flow action. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: matchall: Take verbose flag into account when logging error messagesIdo Schimmel
The verbose flag was added in commit 81c7288b170a ("sched: cls: enable verbose logging") to avoid suppressing logging of error messages that occur "when the rule is not to be exclusively executed by the hardware". However, such error messages are currently suppressed when setup of flow action fails. Take the verbose flag into account to avoid suppressing error messages. This is done by using the extack pointer initialized by tc_cls_common_offload_init(), which performs the necessary checks. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: hw: rx: use return value of tls_device_decrypted() to carry statusJakub Kicinski
Instead of tls_device poking into internals of the message return 1 from tls_device_decrypted() if the device handled the decryption. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: refactor decrypt_skb_update()Jakub Kicinski
Use early return and a jump label to remove two indentation levels. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't issue wake ups when data is decryptedJakub Kicinski
We inform the applications that data is available when the record is received. Decryption happens inline inside recvmsg or splice call. Generating another wakeup inside the decryption handler seems pointless as someone must be actively reading the socket if we are executing this code. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: replace 'back' with 'offset'Jakub Kicinski
The padding length TLS 1.3 logic is searching for content_type from the end of text. IMHO the code is easier to parse if we calculate offset and decrement it rather than try to maintain positive offset from the end of the record called "back". Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: use a define for tag lengthJakub Kicinski
TLS 1.3 has to strip padding, and it starts out 16 bytes from the end of the record. Make it clear this is because of the auth tag. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: init decrypted status in tls_read_size()Jakub Kicinski
We set the record type in tls_read_size(), can as well init the tlm->decrypted field there. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't store the decryption status in socket contextJakub Kicinski
Similar justification to previous change, the information about decryption status belongs in the skb. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't store the record type in socket contextJakub Kicinski
Original TLS implementation was handling one record at a time. It stashed the type of the record inside tls context (per socket structure) for convenience. When async crypto support was added [1] the author had to use skb->cb to store the type per-message. The use of skb->cb overlaps with strparser, however, so a hybrid approach was taken where type is stored in context while parsing (since we parse a message at a time) but once parsed its copied to skb->cb. Recently a workaround for sockmaps [2] exposed the previously private struct _strp_msg and started a trend of adding user fields directly in strparser's header. This is cleaner than storing information about an skb in the context. This change is not strictly necessary, but IMHO the ownership of the context field is confusing. Information naturally belongs to the skb. [1] commit 94524d8fc965 ("net/tls: Add support for async decryption of tls records") [2] commit b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: drop pointless else after gotoJakub Kicinski
Pointless else branch after goto makes the code harder to refactor down the line. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: jump to a more appropriate labelJakub Kicinski
'recv_end:' checks num_async and decrypted, and is then followed by the 'end' label. Since we know that decrypted and num_async are 0 at the start we can jump to 'end'. Move the init of decrypted and num_async to let the compiler catch if I'm wrong. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07net: mpls: fix memdup.cocci warningGONG, Ruiqi
Simply use kmemdup instead of explicitly allocating and copying memory. Generated by: scripts/coccinelle/api/memdup.cocci Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Link: https://lore.kernel.org/r/20220406114629.182833-1-gongruiqi1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07net: ethernet: set default assignment identifier to NET_NAME_ENUMIan Wienand
As noted in the original commit 685343fc3ba6 ("net: add name_assign_type netdev attribute") ... when the kernel has given the interface a name using global device enumeration based on order of discovery (ethX, wlanY, etc) ... are labelled NET_NAME_ENUM. That describes this case, so set the default for the devices here to NET_NAME_ENUM. Current popular network setup tools like systemd use this only to warn if you're setting static settings on interfaces that might change, so it is expected this only leads to better user information, but not changing of interfaces, etc. Signed-off-by: Ian Wienand <iwienand@redhat.com> Link: https://lore.kernel.org/r/20220406093635.1601506-1-iwienand@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07tcp: Add tracepoint for tcp_set_ca_statePing Gan
The congestion status of a tcp flow may be updated since there is congestion between tcp sender and receiver. It makes sense to add tracepoint for congestion status set function to summate cc status duration and evaluate the performance of network and congestion algorithm. the backgound of this patch is below. Link: https://github.com/iovisor/bcc/pull/3899 Signed-off-by: Ping Gan <jacky_gam_2001@163.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07net-core: rx_otherhost_dropped to core_statsJeffrey Ji
Increment rx_otherhost_dropped counter when packet dropped due to mismatched dest MAC addr. An example when this drop can occur is when manually crafting raw packets that will be consumed by a user space application via a tap device. For testing purposes local traffic was generated using trafgen for the client and netcat to start a server Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the counter was incremented. (Also had to modify iproute2 to show the stat, additional patch for that coming next.) Signed-off-by: Jeffrey Ji <jeffreyji@google.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07net: extract a few internals from netdevice.hJakub Kicinski
There's a number of functions and static variables used under net/core/ but not from the outside. We currently dump most of them into netdevice.h. That bad for many reasons: - netdevice.h is very cluttered, hard to figure out what the APIs are; - netdevice.h is very long; - we have to touch netdevice.h more which causes expensive incremental builds. Create a header under net/core/ and move some declarations. The new header is also a bit of a catch-all but that's fine, if we create more specific headers people will likely over-think where their declaration fit best. And end up putting them in netdevice.h, again. More work should be done on splitting netdevice.h into more targeted headers, but that'd be more time consuming so small steps. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07net: unexport a handful of dev_* functionsJakub Kicinski
We have a bunch of functions which are only used under net/core/ yet they get exported. Remove the exports. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06ipv6: fix locking issues with loops over idev->addr_listNiels Dossche
idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on inet6_ifaddr instances. For example, multiple functions (like addrconf_{join,leave}_anycast) eventually call down to other functions that acquire the idev->lock. The current code temporarily unlocked the idev->lock during the loops, which can cause race conditions. Moving the locks up is also not an appropriate solution as the ordering of lock acquisition will be inconsistent with for example mc_lock. This solution adds an additional field to inet6_ifaddr that is used to temporarily add the instances to a temporary list while holding idev->lock. The temporary list can then be traversed without holding idev->lock. This change was done in two places. In addrconf_ifdown, the list_for_each_entry_safe variant of the list loop is also no longer necessary as there is no deletion within that specific loop. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf 2022-04-06 We've added 8 non-merge commits during the last 8 day(s) which contain a total of 9 files changed, 139 insertions(+), 36 deletions(-). The main changes are: 1) rethook related fixes, from Jiri and Masami. 2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin. 3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets bpf: Support dual-stack sockets in bpf_tcp_check_syncookie bpf: selftests: Test fentry tracing a struct_ops program bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT rethook: Fix to use WRITE_ONCE() for rethook:: Handler selftests/bpf: Fix warning comparing pointer to 0 bpf: Fix sparse warnings in kprobe_multi_resolve_syms bpftool: Explicit errno handling in skeletons ==================== Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06tcp: add accessors to read/set tp->snd_cwndEric Dumazet
We had various bugs over the years with code breaking the assumption that tp->snd_cwnd is greater than zero. Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction") can trigger, and without a repro we would have to spend considerable time finding the bug. Instead of complaining too late, we want to catch where and when tp->snd_cwnd is set to an illegal value. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06net: netfilter: Reports ct direction in CT lookup helpers for XDP and TC-BPFLorenzo Bianconi
Report connection tracking tuple direction in bpf_skb_ct_lookup/bpf_xdp_ct_lookup helpers. Direction will be used to implement snat/dnat through xdp ebpf program. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/aa1aaac89191cfc64078ecef36c0a48c302321b9.1648908601.git.lorenzo@kernel.org
2022-04-06bpf: Support dual-stack sockets in bpf_tcp_check_syncookieMaxim Mikityanskiy
bpf_tcp_gen_syncookie looks at the IP version in the IP header and validates the address family of the socket. It supports IPv4 packets in AF_INET6 dual-stack sockets. On the other hand, bpf_tcp_check_syncookie looks only at the address family of the socket, ignoring the real IP version in headers, and validates only the packet size. This implementation has some drawbacks: 1. Packets are not validated properly, allowing a BPF program to trick bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4 socket. 2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end up receiving a SYNACK with the cookie, but the following ACK gets dropped. This patch fixes these issues by changing the checks in bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP version from the header is taken into account, and it is validated properly with address family. Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Arthur Fabre <afabre@cloudflare.com> Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
2022-04-06ip6_tunnel: Remove duplicate assignmentsHongbin Wang
There is a same action when the variable is initialized Signed-off-by: Hongbin Wang <wh_bin@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=nFlorian Westphal
net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole' Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used. Fixes: 4b340a5a726d ("net: ip6mr: add support for passing full packet on wrong mif") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: ethernet: mtk_eth_soc: implement flow offloading to WED devicesFelix Fietkau
This allows hardware flow offloading from Ethernet to WLAN on MT7622 SoC Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06rxrpc: fix a race in rxrpc_exit_net()Eric Dumazet
Current code can lead to the following race: CPU0 CPU1 rxrpc_exit_net() rxrpc_peer_keepalive_worker() if (rxnet->live) rxnet->live = false; del_timer_sync(&rxnet->peer_keepalive_timer); timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay); cancel_work_sync(&rxnet->peer_keepalive_work); rxrpc_exit_net() exits while peer_keepalive_timer is still armed, leading to use-after-free. syzbot report was: ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0 WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0 R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __debug_check_no_obj_freed lib/debugobjects.c:992 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023 kfree+0xd6/0x310 mm/slab.c:3809 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176 ops_free_list net/core/net_namespace.c:174 [inline] cleanup_net+0x591/0xb00 net/core/net_namespace.c:598 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 </TASK> Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: remove noblock parameter from skb_recv_datagram()Oliver Hartkopp
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: openvswitch: fix leak of nested actionsIlya Maximets
While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions. Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory. For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc(): actions:clone(ct(commit),0) Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references. Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS. Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively. New build time assertion should protect us from this problem if new actions will be added in the future. Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too. Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-05net: openvswitch: don't send internal clone attribute to the userspace.Ilya Maximets
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI. The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows: # ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0) With the fix: # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0) Additionally fixed an incorrect attribute name in the comment. Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>