summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-05-18net/mlx5: Set reformat action when needed for termination rulesJianbo Liu
For remote mirroring, after the tunnel packets are received, they are decapsulated and sent to representor, then re-encapsulated and sent out over another tunnel. So reformat action is set only when the destination is required to do encapsulation. Fixes: 249ccc3c95bd ("net/mlx5e: Add support for offloading traffic from uplink to uplink") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-18net/mlx5e: Fix nullptr in add_vlan_push_action()Dima Chumak
The result of dev_get_by_index_rcu() is not checked for NULL and then gets dereferenced immediately. Also, the RCU lock must be held by the caller of dev_get_by_index_rcu(), which isn't satisfied by the call stack. Fix by handling nullptr return value when iflink device is not found. Add RCU locking around dev_get_by_index_rcu() to avoid possible adverse effects while iterating over the net_device's hlist. It is safe not to increment reference count of the net_device pointer in case of a successful lookup, because it's already handled by VLAN code during VLAN device registration (see register_vlan_dev and netdev_upper_dev_link). Fixes: 278748a95aa3 ("net/mlx5e: Offload TC e-switch rules with egress VLAN device") Addresses-Coverity: ("Dereference null return value") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-18{net, RDMA}/mlx5: Fix override of log_max_qp by other deviceMaor Gottlieb
mlx5_core_dev holds pointer to static profile, hence when the log_max_qp of the profile is override by some device, then it effect all other mlx5 devices that share the same profile. Fix it by having a profile instance for every mlx5 device. Fixes: 883371c453b9 ("net/mlx5: Check FW limitations on log_max_qp before setting it") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-19selftests/bpf: Convert test trace_printk to lskel.Alexei Starovoitov
Convert test trace_printk to light skeleton to check rodata support in lskel. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-22-alexei.starovoitov@gmail.com
2021-05-19selftests/bpf: Convert test printk to use rodata.Alexei Starovoitov
Convert test trace_printk to more aggressively validate and use rodata. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-21-alexei.starovoitov@gmail.com
2021-05-19selftests/bpf: Convert atomics test to light skeleton.Alexei Starovoitov
Convert prog_tests/atomics.c to lskel.h Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-20-alexei.starovoitov@gmail.com
2021-05-19selftests/bpf: Convert few tests to light skeleton.Alexei Starovoitov
Convert few tests that don't use CO-RE to light skeleton. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-19-alexei.starovoitov@gmail.com
2021-05-19bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.Alexei Starovoitov
Add -L flag to bpftool to use libbpf gen_trace facility and syscall/loader program for skeleton generation and program loading. "bpftool gen skeleton -L" command will generate a "light skeleton" or "loader skeleton" that is similar to existing skeleton, but has one major difference: $ bpftool gen skeleton lsm.o > lsm.skel.h $ bpftool gen skeleton -L lsm.o > lsm.lskel.h $ diff lsm.skel.h lsm.lskel.h @@ -5,34 +4,34 @@ #define __LSM_SKEL_H__ #include <stdlib.h> -#include <bpf/libbpf.h> +#include <bpf/bpf.h> The light skeleton does not use majority of libbpf infrastructure. It doesn't need libelf. It doesn't parse .o file. It only needs few sys_bpf wrappers. All of them are in bpf/bpf.h file. In future libbpf/bpf.c can be inlined into bpf.h, so not even libbpf.a would be needed to work with light skeleton. "bpftool prog load -L file.o" command is introduced for debugging of syscall/loader program generation. Just like the same command without -L it will try to load the programs from file.o into the kernel. It won't even try to pin them. "bpftool prog load -L -d file.o" command will provide additional debug messages on how syscall/loader program was generated. Also the execution of syscall/loader program will use bpf_trace_printk() for each step of loading BTF, creating maps, and loading programs. The user can do "cat /.../trace_pipe" for further debug. An example of fexit_sleep.lskel.h generated from progs/fexit_sleep.c: struct fexit_sleep { struct bpf_loader_ctx ctx; struct { struct bpf_map_desc bss; } maps; struct { struct bpf_prog_desc nanosleep_fentry; struct bpf_prog_desc nanosleep_fexit; } progs; struct { int nanosleep_fentry_fd; int nanosleep_fexit_fd; } links; struct fexit_sleep__bss { int pid; int fentry_cnt; int fexit_cnt; } *bss; }; Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-18-alexei.starovoitov@gmail.com
2021-05-19libbpf: Introduce bpf_map__initial_value().Alexei Starovoitov
Introduce bpf_map__initial_value() to read initial contents of mmaped data/rodata/bss maps. Note that bpf_map__set_initial_value() doesn't allow modifying kconfig map while bpf_map__initial_value() allows reading its values. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-17-alexei.starovoitov@gmail.com
2021-05-19libbpf: Cleanup temp FDs when intermediate sys_bpf fails.Alexei Starovoitov
Fix loader program to close temporary FDs when intermediate sys_bpf command fails. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-16-alexei.starovoitov@gmail.com
2021-05-19libbpf: Generate loader program out of BPF ELF file.Alexei Starovoitov
The BPF program loading process performed by libbpf is quite complex and consists of the following steps: "open" phase: - parse elf file and remember relocations, sections - collect externs and ksyms including their btf_ids in prog's BTF - patch BTF datasec (since llvm couldn't do it) - init maps (old style map_def, BTF based, global data map, kconfig map) - collect relocations against progs and maps "load" phase: - probe kernel features - load vmlinux BTF - resolve externs (kconfig and ksym) - load program BTF - init struct_ops - create maps - apply CO-RE relocations - patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID - reposition subprograms and adjust call insns - sanitize and load progs During this process libbpf does sys_bpf() calls to load BTF, create maps, populate maps and finally load programs. Instead of actually doing the syscalls generate a trace of what libbpf would have done and represent it as the "loader program". The "loader program" consists of single map with: - union bpf_attr(s) - BTF bytes - map value bytes - insns bytes and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper. Executing such "loader program" via bpf_prog_test_run() command will replay the sequence of syscalls that libbpf would have done which will result the same maps created and programs loaded as specified in the elf file. The "loader program" removes libelf and majority of libbpf dependency from program loading process. kconfig, typeless ksym, struct_ops and CO-RE are not supported yet. The order of relocate_data and relocate_calls had to change, so that bpf_gen__prog_load() can see all relocations for a given program with correct insn_idx-es. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
2021-05-19libbpf: Preliminary support for fd_idxAlexei Starovoitov
Prep libbpf to use FD_IDX kernel feature when generating loader program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-14-alexei.starovoitov@gmail.com
2021-05-19libbpf: Add bpf_object pointer to kernel_supports().Alexei Starovoitov
Add a pointer to 'struct bpf_object' to kernel_supports() helper. It will be used in the next patch. No functional changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-13-alexei.starovoitov@gmail.com
2021-05-19libbpf: Change the order of data and text relocations.Alexei Starovoitov
In order to be able to generate loader program in the later patches change the order of data and text relocations. Also improve the test to include data relos. If the kernel supports "FD array" the map_fd relocations can be processed before text relos since generated loader program won't need to manually patch ld_imm64 insns with map_fd. But ksym and kfunc relocations can only be processed after all calls are relocated, since loader program will consist of a sequence of calls to bpf_btf_find_by_name_kind() followed by patching of btf_id and btf_obj_fd into corresponding ld_imm64 insns. The locations of those ld_imm64 insns are specified in relocations. Hence process all data relocations (maps, ksym, kfunc) together after call relos. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-12-alexei.starovoitov@gmail.com
2021-05-19bpf: Add bpf_sys_close() helper.Alexei Starovoitov
Add bpf_sys_close() helper to be used by the syscall/loader program to close intermediate FDs and other cleanup. Note this helper must never be allowed inside fdget/fdput bracketing. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
2021-05-19bpf: Add bpf_btf_find_by_name_kind() helper.Alexei Starovoitov
Add new helper: long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags) Description Find BTF type with given name and kind in vmlinux BTF or in module's BTFs. Return Returns btf_id and btf_obj_fd in lower and upper 32 bits. It will be used by loader program to find btf_id to attach the program to and to find btf_ids of ksyms. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
2021-05-19bpf: Introduce fd_idxAlexei Starovoitov
Typical program loading sequence involves creating bpf maps and applying map FDs into bpf instructions in various places in the bpf program. This job is done by libbpf that is using compiler generated ELF relocations to patch certain instruction after maps are created and BTFs are loaded. The goal of fd_idx is to allow bpf instructions to stay immutable after compilation. At load time the libbpf would still create maps as usual, but it wouldn't need to patch instructions. It would store map_fds into __u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
2021-05-19selftests/bpf: Test for btf_load command.Alexei Starovoitov
Improve selftest to check that btf_load is working from bpf program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-8-alexei.starovoitov@gmail.com
2021-05-19bpf: Make btf_load command to be bpfptr_t compatible.Alexei Starovoitov
Similar to prog_load make btf_load command to be availble to bpf_prog_type_syscall program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-7-alexei.starovoitov@gmail.com
2021-05-19selftests/bpf: Test for syscall program typeAlexei Starovoitov
bpf_prog_type_syscall is a program that creates a bpf map, updates it, and loads another bpf program using bpf_sys_bpf() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-6-alexei.starovoitov@gmail.com
2021-05-19libbpf: Support for syscall program typeAlexei Starovoitov
Trivial support for syscall program type. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-5-alexei.starovoitov@gmail.com
2021-05-19bpf: Prepare bpf syscall to be used from kernel and user space.Alexei Starovoitov
With the help from bpfptr_t prepare relevant bpf syscall commands to be used from kernel and user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com
2021-05-19bpf: Introduce bpfptr_t user/kernel pointer.Alexei Starovoitov
Similar to sockptr_t introduce bpfptr_t with few additions: make_bpfptr() creates new user/kernel pointer in the same address space as existing user/kernel pointer. bpfptr_add() advances the user/kernel pointer. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-3-alexei.starovoitov@gmail.com
2021-05-19bpf: Introduce bpf_sys_bpf() helper and program type.Alexei Starovoitov
Add placeholders for bpf_sys_bpf() helper and new program type. Make sure to check that expected_attach_type is zero for future extensibility. Allow tracing helper functions to be used in this program type, since they will only execute from user context via bpf_prog_test_run. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
2021-05-18signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfoEric W. Biederman
With the addition of ssi_perf_data and ssi_perf_type struct signalfd_siginfo is dangerously close to running out of space. All that remains is just enough space for two additional 64bit fields. A practice of adding all possible siginfo_t fields into struct singalfd_siginfo can not be supported as adding the missing fields ssi_lower, ssi_upper, and ssi_pkey would require two 64bit fields and one 32bit fields. In practice the fields ssi_perf_data and ssi_perf_type can never be used by signalfd as the signal that generates them always delivers them synchronously to the thread that triggers them. Therefore until someone actually needs the fields ssi_perf_data and ssi_perf_type in signalfd_siginfo remove them. This leaves a bit more room for future expansion. v1: https://lkml.kernel.org/r/20210503203814.25487-12-ebiederm@xmission.com v2: https://lkml.kernel.org/r/20210505141101.11519-12-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-5-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18signal: Deliver all of the siginfo perf data in _perfEric W. Biederman
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18signal: Factor force_sig_perf out of perf_sigtrapEric W. Biederman
Separate filling in siginfo for TRAP_PERF from deciding that siginal needs to be sent. There are enough little details that need to be correct when properly filling in siginfo_t that it is easy to make mistakes if filling in the siginfo_t is in the same function with other logic. So factor out force_sig_perf to reduce the cognative load of on reviewers, maintainers and implementors. v1: https://lkml.kernel.org/r/m17dkjqqxz.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-3-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18signal: Implement SIL_FAULT_TRAPNOEric W. Biederman
Now that si_trapno is part of the union in _si_fault and available on all architectures, add SIL_FAULT_TRAPNO and update siginfo_layout to return SIL_FAULT_TRAPNO when the code assumes si_trapno is valid. There is room for future changes to reduce when si_trapno is valid but this is all that is needed to make si_trapno and the other members of the the union in _sigfault mutually exclusive. Update the code that uses siginfo_layout to deal with SIL_FAULT_TRAPNO and have the same code ignore si_trapno in in all other cases. v1: https://lkml.kernel.org/r/m1o8dvs7s7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-6-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-2-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18siginfo: Move si_trapno inside the union inside _si_faultEric W. Biederman
It turns out that linux uses si_trapno very sparingly, and as such it can be considered extra information for a very narrow selection of signals, rather than information that is present with every fault reported in siginfo. As such move si_trapno inside the union inside of _si_fault. This results in no change in placement, and makes it eaiser to extend _si_fault in the future as this reduces the number of special cases. In particular with si_trapno included in the union it is no longer a concern that the union must be pointer aligned on most architectures because the union follows immediately after si_addr which is a pointer. This change results in a difference in siginfo field placement on sparc and alpha for the fields si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. These architectures do not implement the signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. Further these architecture have not yet implemented the userspace that would use si_perf. The point of this change is in fact to correct these placement issues before sparc or alpha grow userspace that cares. This change was discussed[1] and the agreement is that this change is currently safe. [1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.com Acked-by: Marco Elver <elver@google.com> v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18net: mdio: provide shim implementation of devm_of_mdiobus_registerVladimir Oltean
Similar to the way in which of_mdiobus_register() has a fallback to the non-DT based mdiobus_register() when CONFIG_OF is not set, we can create a shim for the device-managed devm_of_mdiobus_register() which calls devm_mdiobus_register() and discards the struct device_node *. In particular, this solves a build issue with the qca8k DSA driver which uses devm_of_mdiobus_register and can be compiled without CONFIG_OF. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: dcb: Remove unnecessary INIT_LIST_HEAD()Yang Yingliang
The list_head dcb_app_list is initialized statically. It is unnecessary to initialize by INIT_LIST_HEAD(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18cxgb4: clip_tbl: use list_del_init instead of list_del/INIT_LIST_HEADYang Yingliang
Using list_del_init() instead of list_del() + INIT_LIST_HEAD() to simpify the code. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18Merge branch 'wan-cleanups'David S. Miller
Guangbin Huang says: ==================== net: wan: clean up some code style issues This patchset clean up some code style issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: wan: fix variable definition stylePeng Li
Fix the checkpatch error: "foo* bar" should be "foo *bar". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: wan: remove redundant spacePeng Li
Space prohibited before that close parenthesis ')', so removes the redundant space. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: wan: remove redundant braces {}Peng Li
Braces {} are not necessary for single statement blocks, this patch removes redundant braces {}. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: wan: add some required spacesPeng Li
Add space required before the open parenthesis '(', and add spaces required around that '<', '>' and '!='. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: wan: remove redundant blank linesPeng Li
This patch removes some redundant blank lines. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18Merge branch 'hns3-fixes'David S. Miller
Huazhong Tan says: ==================== net: hns3: fixes for -net This series includes some bugfixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: hns3: check the return of skb_checksum_help()Yunsheng Lin
Currently skb_checksum_help()'s return is ignored, but it may return error when it fails to allocate memory when linearizing. So adds checking for the return of skb_checksum_help(). Fixes: 76ad4f0ee747("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Fixes: 3db084d28dc0("net: hns3: Fix for vxlan tx checksum bug") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: hns3: fix user's coalesce configuration lost issueHuazhong Tan
Currently, when adaptive is on, the user's coalesce configuration may be overwritten by the dynamic one. The reason is that user's configurations are saved in struct hns3_enet_tqp_vector whose value maybe changed by the dynamic algorithm. To fix it, use struct hns3_nic_priv instead of struct hns3_enet_tqp_vector to save and get the user's configuration. BTW, operations of storing and restoring coalesce info in the reset process are unnecessary now, so remove them as well. Fixes: 434776a5fae2 ("net: hns3: add ethtool_ops.set_coalesce support to PF") Fixes: 7e96adc46633 ("net: hns3: add ethtool_ops.get_coalesce support to PF") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: hns3: put off calling register_netdev() until client initialize completeJian Shen
Currently, the netdevice is registered before client initializing complete. So there is a timewindow between netdevice available and usable. In this case, if user try to change the channel number or ring param, it may cause the hns3_set_rx_cpu_rmap() being called twice, and report bug. [47199.416502] hns3 0000:35:00.0 eth1: set channels: tqp_num=1, rxfh=0 [47199.430340] hns3 0000:35:00.0 eth1: already uninitialized [47199.438554] hns3 0000:35:00.0: rss changes from 4 to 1 [47199.511854] hns3 0000:35:00.0: Channels changed, rss_size from 4 to 1, tqps from 4 to 1 [47200.163524] ------------[ cut here ]------------ [47200.171674] kernel BUG at lib/cpu_rmap.c:142! [47200.177847] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [47200.185259] Modules linked in: hclge(+) hns3(-) hns3_cae(O) hns_roce_hw_v2 hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [last unloaded: hclge] [47200.205912] CPU: 1 PID: 8260 Comm: ethtool Tainted: G O 5.11.0-rc3+ #1 [47200.215601] Hardware name: , xxxxxx 02/04/2021 [47200.223052] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--) [47200.230188] pc : cpu_rmap_add+0x38/0x40 [47200.237472] lr : irq_cpu_rmap_add+0x84/0x140 [47200.243291] sp : ffff800010e93a30 [47200.247295] x29: ffff800010e93a30 x28: ffff082100584880 [47200.254155] x27: 0000000000000000 x26: 0000000000000000 [47200.260712] x25: 0000000000000000 x24: 0000000000000004 [47200.267241] x23: ffff08209ba03000 x22: ffff08209ba038c0 [47200.273789] x21: 000000000000003f x20: ffff0820e2bc1680 [47200.280400] x19: ffff0820c970ec80 x18: 00000000000000c0 [47200.286944] x17: 0000000000000000 x16: ffffb43debe4a0d0 [47200.293456] x15: fffffc2082990600 x14: dead000000000122 [47200.300059] x13: ffffffffffffffff x12: 000000000000003e [47200.306606] x11: ffff0820815b8080 x10: ffff53e411988000 [47200.313171] x9 : 0000000000000000 x8 : ffff0820e2bc1700 [47200.319682] x7 : 0000000000000000 x6 : 000000000000003f [47200.326170] x5 : 0000000000000040 x4 : ffff800010e93a20 [47200.332656] x3 : 0000000000000004 x2 : ffff0820c970ec80 [47200.339168] x1 : ffff0820e2bc1680 x0 : 0000000000000004 [47200.346058] Call trace: [47200.349324] cpu_rmap_add+0x38/0x40 [47200.354300] hns3_set_rx_cpu_rmap+0x6c/0xe0 [hns3] [47200.362294] hns3_reset_notify_init_enet+0x1cc/0x340 [hns3] [47200.370049] hns3_change_channels+0x40/0xb0 [hns3] [47200.376770] hns3_set_channels+0x12c/0x2a0 [hns3] [47200.383353] ethtool_set_channels+0x140/0x250 [47200.389772] dev_ethtool+0x714/0x23d0 [47200.394440] dev_ioctl+0x4cc/0x640 [47200.399277] sock_do_ioctl+0x100/0x2a0 [47200.404574] sock_ioctl+0x28c/0x470 [47200.409079] __arm64_sys_ioctl+0xb4/0x100 [47200.415217] el0_svc_common.constprop.0+0x84/0x210 [47200.422088] do_el0_svc+0x28/0x34 [47200.426387] el0_svc+0x28/0x70 [47200.431308] el0_sync_handler+0x1a4/0x1b0 [47200.436477] el0_sync+0x174/0x180 [47200.441562] Code: 11000405 79000c45 f8247861 d65f03c0 (d4210000) [47200.448869] ---[ end trace a01efe4ce42e5f34 ]--- The process is like below: excuting hns3_client_init | register_netdev() | hns3_set_channels() | | hns3_set_rx_cpu_rmap() hns3_reset_notify_uninit_enet() | | | quit without calling function | hns3_free_rx_cpu_rmap for flag | HNS3_NIC_STATE_INITED is unset. | | | hns3_reset_notify_init_enet() | | set HNS3_NIC_STATE_INITED call hns3_set_rx_cpu_rmap()-- crash Fix it by calling register_netdev() at the end of function hns3_client_init(). Fixes: 08a100689d4b ("net: hns3: re-organize vector handle") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: hns3: fix incorrect resp_msg issueJiaran Zhang
In hclge_mbx_handler(), if there are two consecutive mailbox messages that requires resp_msg, the resp_msg is not cleared after processing the first message, which will cause the resp_msg data of second message incorrect. Fix it by clearing the resp_msg before processing every mailbox message. Fixes: bb5790b71bad ("net: hns3: refactor mailbox response scheme between PF and VF") Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: dsa: qca8k: fix missing unlock on error in qca8k_vlan_(add|del)Wei Yongjun
Add the missing unlock before return from function qca8k_vlan_add() and qca8k_vlan_del() in the error handling case. Fixes: 028f5f8ef44f ("net: dsa: qca8k: handle error with qca8k_read operation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18net: lan78xx: advertise tx software timestamping supportMarkus Bloechl
lan78xx already calls skb_tx_timestamp() in its lan78xx_start_xmit(). Override .get_ts_info to also advertise this capability (SOF_TIMESTAMPING_TX_SOFTWARE) via ethtool. Signed-off-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18cipso: correct comments of cipso_v4_cache_invalidate()Zheng Yejian
Since cipso_v4_cache_invalidate() has no return value, so drop related descriptions in its comments. Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18Merge branch 'custom-multipath-hash'David S. Miller
Ido Schimmel says: ==================== Add support for custom multipath hash This patchset adds support for custom multipath hash policy for both IPv4 and IPv6 traffic. The new policy allows user space to control the outer and inner packet fields used for the hash computation. Motivation ========== Linux currently supports different multipath hash policies for IPv4 and IPv6 traffic: * Layer 3 * Layer 4 * Layer 3 or inner layer 3, if present These policies hash on a fixed set of fields, which is inflexible and against operators' requirements to control the hash input: "The ability to control the inputs to the hash function should be a consideration in any load-balancing RFP" [1]. An example of this inflexibility can be seen by the fact that none of the current policies allows operators to use the standard 5-tuple and the flow label for multipath hash computation. Such a policy is useful in the following real-world example of a data center with the following types of traffic: * Anycast IPv6 TCP traffic towards layer 4 load balancers. Flow label is constant (zero) to avoid breaking established connections * Non-encapsulated IPv6 traffic. Flow label is used to re-route flows around problematic (congested / failed) paths [2] * IPv6 encapsulated traffic (IPv4-in-IPv6 or IPv6-in-IPv6). Outer flow label is generated from encapsulated packet * UDP encapsulated traffic. Outer source port is generated from encapsulated packet In the above example, using the inner flow information for hash computation in addition to the outer flow information is useful during failures of the BPF agent that selectively generates the flow label based on the traffic type. In such cases, the self-healing properties of the flow label are lost, but encapsulated flows are still load balanced. Control over the inner fields is even more critical when encapsulation is performed by hardware routers. For example, the Spectrum ASIC can only encode 8 bits of entropy in the outer flow label / outer UDP source port when performing IP / UDP encapsulation. In the case of IPv4 GRE encapsulation there is no outer field to encode the inner hash in. User interface ============== In accordance with existing multipath hash configuration, the new custom policy is added as a new option (3) to the net.ipv{4,6}.fib_multipath_hash_policy sysctls. When the new policy is used, the packet fields used for hash computation are determined by the net.ipv{4,6}.fib_multipath_hash_fields sysctls. These sysctls accept a bitmask according to the following table (from ip-sysctl.rst): ====== ============================ 0x0001 Source IP address 0x0002 Destination IP address 0x0004 IP protocol 0x0008 Flow Label 0x0010 Source port 0x0020 Destination port 0x0040 Inner source IP address 0x0080 Inner destination IP address 0x0100 Inner IP protocol 0x0200 Inner Flow Label 0x0400 Inner source port 0x0800 Inner destination port ====== ============================ For example, to allow IPv6 traffic to be hashed based on standard 5-tuple and flow label: # sysctl -wq net.ipv6.fib_multipath_hash_fields=0x0037 # sysctl -wq net.ipv6.fib_multipath_hash_policy=3 Implementation ============== As with existing policies, the new policy relies on the flow dissector to extract the packet fields for the hash computation. However, unlike existing policies that either use the outer or inner flow, the new policy might require both flows to be dissected. To avoid unnecessary invocations of the flow dissector, the data path skips dissection of the outer or inner flows if none of the outer or inner fields are required. In addition, inner flow dissection is not performed when no encapsulation was encountered (i.e., 'FLOW_DIS_ENCAPSULATION' not set by flow dissector) during dissection of the outer flow. Testing ======= Three new selftests are added with three different topologies that allow testing of following traffic combinations: * Non-encapsulated IPv4 / IPv6 traffic * IPv4 / IPv6 overlay over IPv4 underlay * IPv4 / IPv6 overlay over IPv6 underlay All three tests follow the same pattern. Each time a different packet field is used for hash computation. When the field changes in the packet stream, traffic is expected to be balanced across the two paths. When the field does not change, traffic is expected to be unbalanced across the two paths. Patchset overview ================= Patches #1-#3 add custom multipath hash support for IPv4 traffic Patches #4-#7 do the same for IPv6 Patches #8-#10 add selftests Future work =========== mlxsw support can be found here [3]. Changes since RFC v2 [4]: * Patch #2: Document that 0x0008 is used for Flow Label * Patch #2: Do not allow the bitmask to be zero * Patch #6: Do not allow the bitmask to be zero Changes since RFC v1 [5]: * Use a bitmask instead of a bitmap [1] https://blog.apnic.net/2018/01/11/ipv6-flow-label-misuse-hashing/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3acf3ec3f4b0fd4263989f2e4227bbd1c42b5fe1 [3] https://github.com/idosch/linux/tree/submit/custom_hash_mlxsw_v2 [4] https://lore.kernel.org/netdev/20210509151615.200608-1-idosch@idosch.org/ [5] https://lore.kernel.org/netdev/20210502162257.3472453-1-idosch@idosch.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18selftests: forwarding: Add test for custom multipath hash with IPv6 GREIdo Schimmel
Test that when the hash policy is set to custom, traffic is distributed only according to the inner fields set in the fib_multipath_hash_fields sysctl. Each time set a different field and make sure traffic is only distributed when the field is changed in the packet stream. The test only verifies the behavior of IPv4/IPv6 overlays on top of an IPv6 underlay network. The previous patch verified the same with an IPv4 underlay network. Example output: # ./ip6gre_custom_multipath_hash.sh TEST: ping [ OK ] TEST: ping6 [ OK ] INFO: Running IPv4 overlay custom multipath hash tests TEST: Multipath hash field: Inner source IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6602 / 6002 TEST: Multipath hash field: Inner source IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 1 / 12601 TEST: Multipath hash field: Inner destination IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6802 / 5801 TEST: Multipath hash field: Inner destination IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12602 / 3 TEST: Multipath hash field: Inner source port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16431 / 16344 TEST: Multipath hash field: Inner source port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 32773 TEST: Multipath hash field: Inner destination port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16431 / 16344 TEST: Multipath hash field: Inner destination port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 2 / 32772 INFO: Running IPv6 overlay custom multipath hash tests TEST: Multipath hash field: Inner source IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6704 / 5902 TEST: Multipath hash field: Inner source IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 1 / 12600 TEST: Multipath hash field: Inner destination IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 5751 / 6852 TEST: Multipath hash field: Inner destination IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12602 / 0 TEST: Multipath hash field: Inner flowlabel (balanced) [ OK ] INFO: Packets sent on path1 / path2: 8272 / 8181 TEST: Multipath hash field: Inner flowlabel (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 3 / 12602 TEST: Multipath hash field: Inner source port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16424 / 16351 TEST: Multipath hash field: Inner source port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 3 / 32774 TEST: Multipath hash field: Inner destination port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16425 / 16350 TEST: Multipath hash field: Inner destination port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 2 / 32773 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18selftests: forwarding: Add test for custom multipath hash with IPv4 GREIdo Schimmel
Test that when the hash policy is set to custom, traffic is distributed only according to the inner fields set in the fib_multipath_hash_fields sysctl. Each time set a different field and make sure traffic is only distributed when the field is changed in the packet stream. The test only verifies the behavior of IPv4/IPv6 overlays on top of an IPv4 underlay network. A subsequent patch will do the same with an IPv6 underlay network. Example output: # ./gre_custom_multipath_hash.sh TEST: ping [ OK ] TEST: ping6 [ OK ] INFO: Running IPv4 overlay custom multipath hash tests TEST: Multipath hash field: Inner source IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6601 / 6001 TEST: Multipath hash field: Inner source IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 12600 TEST: Multipath hash field: Inner destination IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6802 / 5802 TEST: Multipath hash field: Inner destination IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12601 / 1 TEST: Multipath hash field: Inner source port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16430 / 16344 TEST: Multipath hash field: Inner source port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 32772 TEST: Multipath hash field: Inner destination port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16430 / 16343 TEST: Multipath hash field: Inner destination port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 32772 INFO: Running IPv6 overlay custom multipath hash tests TEST: Multipath hash field: Inner source IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6702 / 5900 TEST: Multipath hash field: Inner source IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 12601 TEST: Multipath hash field: Inner destination IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 5751 / 6851 TEST: Multipath hash field: Inner destination IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12602 / 1 TEST: Multipath hash field: Inner flowlabel (balanced) [ OK ] INFO: Packets sent on path1 / path2: 8364 / 8065 TEST: Multipath hash field: Inner flowlabel (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12601 / 0 TEST: Multipath hash field: Inner source port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16425 / 16349 TEST: Multipath hash field: Inner source port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 1 / 32770 TEST: Multipath hash field: Inner destination port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16425 / 16349 TEST: Multipath hash field: Inner destination port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 2 / 32770 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18selftests: forwarding: Add test for custom multipath hashIdo Schimmel
Test that when the hash policy is set to custom, traffic is distributed only according to the outer fields set in the fib_multipath_hash_fields sysctl. Each time set a different field and make sure traffic is only distributed when the field is changed in the packet stream. The test only verifies the behavior with non-encapsulated IPv4 and IPv6 packets. Subsequent patches will add tests for IPv4/IPv6 overlays on top of IPv4/IPv6 underlay networks. Example output: # ./custom_multipath_hash.sh TEST: ping [ OK ] TEST: ping6 [ OK ] INFO: Running IPv4 custom multipath hash tests TEST: Multipath hash field: Source IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6353 / 6254 TEST: Multipath hash field: Source IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 12600 TEST: Multipath hash field: Destination IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6102 / 6502 TEST: Multipath hash field: Destination IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 1 / 12601 TEST: Multipath hash field: Source port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16428 / 16345 TEST: Multipath hash field: Source port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 32770 / 2 TEST: Multipath hash field: Destination port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16428 / 16345 TEST: Multipath hash field: Destination port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 32770 / 2 INFO: Running IPv6 custom multipath hash tests TEST: Multipath hash field: Source IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 6704 / 5903 TEST: Multipath hash field: Source IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12600 / 0 TEST: Multipath hash field: Destination IP (balanced) [ OK ] INFO: Packets sent on path1 / path2: 5551 / 7052 TEST: Multipath hash field: Destination IP (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 12603 / 0 TEST: Multipath hash field: Flowlabel (balanced) [ OK ] INFO: Packets sent on path1 / path2: 8378 / 8080 TEST: Multipath hash field: Flowlabel (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 2 / 12603 TEST: Multipath hash field: Source port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16385 / 16388 TEST: Multipath hash field: Source port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 0 / 32774 TEST: Multipath hash field: Destination port (balanced) [ OK ] INFO: Packets sent on path1 / path2: 16386 / 16390 TEST: Multipath hash field: Destination port (unbalanced) [ OK ] INFO: Packets sent on path1 / path2: 32771 / 2 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>