From f45b2974cc0ae959a4c503a071e38a56bd64372f Mon Sep 17 00:00:00 2001 From: Björn Töpel Date: Wed, 17 Nov 2021 13:57:08 +0100 Subject: bpf, x86: Fix "no previous prototype" warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The arch_prepare_bpf_dispatcher function does not have a prototype, and yields the following warning when W=1 is enabled for the kernel build. >> arch/x86/net/bpf_jit_comp.c:2188:5: warning: no previous \ prototype for 'arch_prepare_bpf_dispatcher' [-Wmissing-prototypes] 2188 | int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, \ int num_funcs) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Remove the warning by adding a function declaration to include/linux/bpf.h. Fixes: 75ccbef6369e ("bpf: Introduce BPF dispatcher") Reported-by: kernel test robot Signed-off-by: Björn Töpel Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211117125708.769168-1-bjorn@kernel.org --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e7a163a3146b..84ff6ef49462 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -732,6 +732,7 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr) struct bpf_trampoline *bpf_trampoline_get(u64 key, struct bpf_attach_target_info *tgt_info); void bpf_trampoline_put(struct bpf_trampoline *tr); +int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs); #define BPF_DISPATCHER_INIT(_name) { \ .mutex = __MUTEX_INITIALIZER(_name.mutex), \ .func = &_name##_func, \ -- cgit From 38207a5e81230d6ffbdd51e5fa5681be5116dcae Mon Sep 17 00:00:00 2001 From: John Fastabend Date: Fri, 19 Nov 2021 10:14:17 -0800 Subject: bpf, sockmap: Attach map progs to psock early for feature probes When a TCP socket is added to a sock map we look at the programs attached to the map to determine what proto op hooks need to be changed. Before the patch in the 'fixes' tag there were only two categories -- the empty set of programs or a TX policy. In any case the base set handled the receive case. After the fix we have an optimized program for receive that closes a small, but possible, race on receive. This program is loaded only when the map the psock is being added to includes a RX policy. Otherwise, the race is not possible so we don't need to handle the race condition. In order for the call to sk_psock_init() to correctly evaluate the above conditions all progs need to be set in the psock before the call. However, in the current code this is not the case. We end up evaluating the requirements on the old prog state. If your psock is attached to multiple maps -- for example a tx map and rx map -- then the second update would pull in the correct maps. But, the other pattern with a single rx enabled map the correct receive hooks are not used. The result is the race fixed by the patch in the fixes tag below may still be seen in this case. To fix we simply set all psock->progs before doing the call into sock_map_init(). With this the init() call gets the full list of programs and chooses the correct proto ops on the first iteration instead of requiring the second update to pull them in. This fixes the race case when only a single map is used. Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: John Fastabend Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211119181418.353932-2-john.fastabend@gmail.com --- net/core/sock_map.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index f39ef79ced67..9b528c644fb7 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -282,6 +282,12 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk) if (msg_parser) psock_set_prog(&psock->progs.msg_parser, msg_parser); + if (stream_parser) + psock_set_prog(&psock->progs.stream_parser, stream_parser); + if (stream_verdict) + psock_set_prog(&psock->progs.stream_verdict, stream_verdict); + if (skb_verdict) + psock_set_prog(&psock->progs.skb_verdict, skb_verdict); ret = sock_map_init_proto(sk, psock); if (ret < 0) @@ -292,14 +298,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk) ret = sk_psock_init_strp(sk, psock); if (ret) goto out_unlock_drop; - psock_set_prog(&psock->progs.stream_verdict, stream_verdict); - psock_set_prog(&psock->progs.stream_parser, stream_parser); sk_psock_start_strp(sk, psock); } else if (!stream_parser && stream_verdict && !psock->saved_data_ready) { - psock_set_prog(&psock->progs.stream_verdict, stream_verdict); sk_psock_start_verdict(sk,psock); } else if (!stream_verdict && skb_verdict && !psock->saved_data_ready) { - psock_set_prog(&psock->progs.skb_verdict, skb_verdict); sk_psock_start_verdict(sk, psock); } write_unlock_bh(&sk->sk_callback_lock); -- cgit From c0d95d3380ee099d735e08618c0d599e72f6c8b0 Mon Sep 17 00:00:00 2001 From: John Fastabend Date: Fri, 19 Nov 2021 10:14:18 -0800 Subject: bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap When a sock is added to a sock map we evaluate what proto op hooks need to be used. However, when the program is removed from the sock map we have not been evaluating if that changes the required program layout. Before the patch listed in the 'fixes' tag this was not causing failures because the base program set handles all cases. Specifically, the case with a stream parser and the case with out a stream parser are both handled. With the fix below we identified a race when running with a proto op that attempts to read skbs off both the stream parser and the skb->receive_queue. Namely, that a race existed where when the stream parser is empty checking the skb->receive_queue from recvmsg at the precies moment when the parser is paused and the receive_queue is not empty could result in skipping the stream parser. This may break a RX policy depending on the parser to run. The fix tag then loads a specific proto ops that resolved this race. But, we missed removing that proto ops recv hook when the sock is removed from the sockmap. The result is the stream parser is stopped so no more skbs will be aggregated there, but the hook and BPF program continues to be attached on the psock. User space will then get an EBUSY when trying to read the socket because the recvmsg() handler is now waiting on a stopped stream parser. To fix we rerun the proto ops init() function which will look at the new set of progs attached to the psock and rest the proto ops hook to the correct handlers. And in the above case where we remove the sock from the sock map the RX prog will no longer be listed so the proto ops is removed. Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: John Fastabend Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211119181418.353932-3-john.fastabend@gmail.com --- net/core/skmsg.c | 5 +++++ net/core/sock_map.c | 5 ++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 1ae52ac943f6..8eb671c827f9 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1124,6 +1124,8 @@ void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock) void sk_psock_stop_strp(struct sock *sk, struct sk_psock *psock) { + psock_set_prog(&psock->progs.stream_parser, NULL); + if (!psock->saved_data_ready) return; @@ -1212,6 +1214,9 @@ void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock) void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock) { + psock_set_prog(&psock->progs.stream_verdict, NULL); + psock_set_prog(&psock->progs.skb_verdict, NULL); + if (!psock->saved_data_ready) return; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 9b528c644fb7..4ca4b11f4e5f 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -167,8 +167,11 @@ static void sock_map_del_link(struct sock *sk, write_lock_bh(&sk->sk_callback_lock); if (strp_stop) sk_psock_stop_strp(sk, psock); - else + if (verdict_stop) sk_psock_stop_verdict(sk, psock); + + if (psock->psock_update_sk_prot) + psock->psock_update_sk_prot(sk, psock, false); write_unlock_bh(&sk->sk_callback_lock); } } -- cgit From 6a631c0432dcccbcf45839016a07c015e335e9ae Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Sat, 27 Nov 2021 17:31:59 +0100 Subject: Documentation/locking/locktypes: Update migrate_disable() bits. The initial implementation of migrate_disable() for mainline was a wrapper around preempt_disable(). RT kernels substituted this with a real migrate disable implementation. Later on mainline gained true migrate disable support, but the documentation was not updated. Update the documentation, remove the claims about migrate_disable() mapping to preempt_disable() on non-PREEMPT_RT kernels. Fixes: 74d862b682f51 ("sched: Make migrate_disable/enable() independent of RT") Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211127163200.10466-2-bigeasy@linutronix.de --- Documentation/locking/locktypes.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst index ddada4a53749..4fd7b70fcde1 100644 --- a/Documentation/locking/locktypes.rst +++ b/Documentation/locking/locktypes.rst @@ -439,11 +439,9 @@ preemption. The following substitution works on both kernels:: spin_lock(&p->lock); p->count += this_cpu_read(var2); -On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable() -which makes the above code fully equivalent. On a PREEMPT_RT kernel migrate_disable() ensures that the task is pinned on the current CPU which in turn guarantees that the per-CPU access to var1 and var2 are staying on -the same CPU. +the same CPU while the task remains preemptible. The migrate_disable() substitution is not valid for the following scenario:: @@ -456,9 +454,8 @@ scenario:: p = this_cpu_ptr(&var1); p->val = func2(); -While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because -here migrate_disable() does not protect against reentrancy from a -preempting task. A correct substitution for this case is:: +This breaks because migrate_disable() does not protect against reentrancy from +a preempting task. A correct substitution for this case is:: func() { -- cgit From 79364031c5b4365ca28ac0fa00acfab5bf465be1 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Sat, 27 Nov 2021 17:32:00 +0100 Subject: bpf: Make sure bpf_disable_instrumentation() is safe vs preemption. The initial implementation of migrate_disable() for mainline was a wrapper around preempt_disable(). RT kernels substituted this with a real migrate disable implementation. Later on mainline gained true migrate disable support, but neither documentation nor affected code were updated. Remove stale comments claiming that migrate_disable() is PREEMPT_RT only. Don't use __this_cpu_inc() in the !PREEMPT_RT path because preemption is not disabled and the RMW operation can be preempted. Fixes: 74d862b682f51 ("sched: Make migrate_disable/enable() independent of RT") Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211127163200.10466-3-bigeasy@linutronix.de --- include/linux/bpf.h | 16 ++-------------- include/linux/filter.h | 3 --- 2 files changed, 2 insertions(+), 17 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 84ff6ef49462..755f38e893be 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1353,28 +1353,16 @@ extern struct mutex bpf_stats_enabled_mutex; * kprobes, tracepoints) to prevent deadlocks on map operations as any of * these events can happen inside a region which holds a map bucket lock * and can deadlock on it. - * - * Use the preemption safe inc/dec variants on RT because migrate disable - * is preemptible on RT and preemption in the middle of the RMW operation - * might lead to inconsistent state. Use the raw variants for non RT - * kernels as migrate_disable() maps to preempt_disable() so the slightly - * more expensive save operation can be avoided. */ static inline void bpf_disable_instrumentation(void) { migrate_disable(); - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - this_cpu_inc(bpf_prog_active); - else - __this_cpu_inc(bpf_prog_active); + this_cpu_inc(bpf_prog_active); } static inline void bpf_enable_instrumentation(void) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - this_cpu_dec(bpf_prog_active); - else - __this_cpu_dec(bpf_prog_active); + this_cpu_dec(bpf_prog_active); migrate_enable(); } diff --git a/include/linux/filter.h b/include/linux/filter.h index 24b7ed2677af..534f678ca50f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -640,9 +640,6 @@ static __always_inline u32 bpf_prog_run(const struct bpf_prog *prog, const void * This uses migrate_disable/enable() explicitly to document that the * invocation of a BPF program does not require reentrancy protection * against a BPF program which is invoked from a preempting task. - * - * For non RT enabled kernels migrate_disable/enable() maps to - * preempt_disable/enable(), i.e. it disables also preemption. */ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog, const void *ctx) -- cgit From 099f83aa2d06680d5987e43fed1afeda14b5037e Mon Sep 17 00:00:00 2001 From: Johan Almbladh Date: Tue, 30 Nov 2021 17:08:24 +0100 Subject: mips, bpf: Fix reference to non-existing Kconfig symbol The Kconfig symbol for R10000 ll/sc errata workaround in the MIPS JIT was misspelled, causing the workaround to not take effect when enabled. Fixes: 72570224bb8f ("mips, bpf: Add JIT workarounds for CPU errata") Reported-by: Lukas Bulwahn Signed-off-by: Johan Almbladh Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211130160824.3781635-1-johan.almbladh@anyfinetworks.com --- arch/mips/net/bpf_jit_comp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/net/bpf_jit_comp.h b/arch/mips/net/bpf_jit_comp.h index 6f3a7b07294b..a37fe20818eb 100644 --- a/arch/mips/net/bpf_jit_comp.h +++ b/arch/mips/net/bpf_jit_comp.h @@ -98,7 +98,7 @@ do { \ #define emit(...) __emit(__VA_ARGS__) /* Workaround for R10000 ll/sc errata */ -#ifdef CONFIG_WAR_R10000 +#ifdef CONFIG_WAR_R10000_LLSC #define LLSC_beqz beqzl #else #define LLSC_beqz beqz -- cgit From b43c2793f5e9910862e8fe07846b74e45b104501 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 26 Nov 2021 13:04:03 +0100 Subject: netfilter: nfnetlink_queue: silence bogus compiler warning net/netfilter/nfnetlink_queue.c:601:36: warning: variable 'ctinfo' is uninitialized when used here [-Wuninitialized] if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0) ctinfo is only uninitialized if ct == NULL. Init it to 0 to silence this. Reported-by: kernel test robot Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nfnetlink_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 4acc4b8e9fe5..5837e8efc9c2 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -387,7 +387,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, struct net_device *indev; struct net_device *outdev; struct nf_conn *ct = NULL; - enum ip_conntrack_info ctinfo; + enum ip_conntrack_info ctinfo = 0; struct nfnl_ct_hook *nfnl_ct; bool csum_verify; char *secdata = NULL; -- cgit From 7e4dcc13965c57869684d57a1dc6dd7be589488c Mon Sep 17 00:00:00 2001 From: Mitch Williams Date: Fri, 4 Jun 2021 09:53:28 -0700 Subject: iavf: restore MSI state on reset If the PF experiences an FLR, the VF's MSI and MSI-X configuration will be conveniently and silently removed in the process. When this happens, reset recovery will appear to complete normally but no traffic will pass. The netdev watchdog will helpfully notify everyone of this issue. To prevent such public embarrassment, restore MSI configuration at every reset. For normal resets, this will do no harm, but for VF resets resulting from a PF FLR, this will keep the VF working. Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Mitch Williams Tested-by: George Kuruvinakunnel Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 14934a7a13ef..cfdbf8c08d18 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -2248,6 +2248,7 @@ static void iavf_reset_task(struct work_struct *work) } pci_set_master(adapter->pdev); + pci_restore_msi_state(adapter->pdev); if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) { dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n", -- cgit From d9847eb8be3d895b2b5f514fdf3885d47a0b92a2 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 22 Nov 2021 20:17:40 +0530 Subject: bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL Vinicius Costa Gomes reported [0] that build fails when CONFIG_DEBUG_INFO_BTF is enabled and CONFIG_BPF_SYSCALL is disabled. This leads to btf.c not being compiled, and then no symbol being present in vmlinux for the declarations in btf.h. Since BTF is not useful without enabling BPF subsystem, disallow this combination. However, theoretically disabling both now could still fail, as the symbol for kfunc_btf_id_list variables is not available. This isn't a problem as the compiler usually optimizes the whole register/unregister call, but at lower optimization levels it can fail the build in linking stage. Fix that by adding dummy variables so that modules taking address of them still work, but the whole thing is a noop. [0]: https://lore.kernel.org/bpf/20211110205418.332403-1-vinicius.gomes@intel.com Fixes: 14f267d95fe4 ("bpf: btf: Introduce helpers for dynamic BTF set registration") Reported-by: Vinicius Costa Gomes Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Andrii Nakryiko Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20211122144742.477787-2-memxor@gmail.com --- include/linux/btf.h | 14 ++++++++++---- kernel/bpf/btf.c | 9 ++------- lib/Kconfig.debug | 1 + 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 203eef993d76..0e1b6281fd8f 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -245,7 +245,10 @@ struct kfunc_btf_id_set { struct module *owner; }; -struct kfunc_btf_id_list; +struct kfunc_btf_id_list { + struct list_head list; + struct mutex mutex; +}; #ifdef CONFIG_DEBUG_INFO_BTF_MODULES void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l, @@ -254,6 +257,9 @@ void unregister_kfunc_btf_id_set(struct kfunc_btf_id_list *l, struct kfunc_btf_id_set *s); bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id, struct module *owner); + +extern struct kfunc_btf_id_list bpf_tcp_ca_kfunc_list; +extern struct kfunc_btf_id_list prog_test_kfunc_list; #else static inline void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l, struct kfunc_btf_id_set *s) @@ -268,13 +274,13 @@ static inline bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, { return false; } + +static struct kfunc_btf_id_list bpf_tcp_ca_kfunc_list __maybe_unused; +static struct kfunc_btf_id_list prog_test_kfunc_list __maybe_unused; #endif #define DEFINE_KFUNC_BTF_ID_SET(set, name) \ struct kfunc_btf_id_set name = { LIST_HEAD_INIT(name.list), (set), \ THIS_MODULE } -extern struct kfunc_btf_id_list bpf_tcp_ca_kfunc_list; -extern struct kfunc_btf_id_list prog_test_kfunc_list; - #endif diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index dbc3ad07e21b..ea3df9867cec 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6346,11 +6346,6 @@ BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct) /* BTF ID set registration API for modules */ -struct kfunc_btf_id_list { - struct list_head list; - struct mutex mutex; -}; - #ifdef CONFIG_DEBUG_INFO_BTF_MODULES void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l, @@ -6389,8 +6384,6 @@ bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id, return false; } -#endif - #define DEFINE_KFUNC_BTF_ID_LIST(name) \ struct kfunc_btf_id_list name = { LIST_HEAD_INIT(name.list), \ __MUTEX_INITIALIZER(name.mutex) }; \ @@ -6398,3 +6391,5 @@ bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id, DEFINE_KFUNC_BTF_ID_LIST(bpf_tcp_ca_kfunc_list); DEFINE_KFUNC_BTF_ID_LIST(prog_test_kfunc_list); + +#endif diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 9ef7ce18b4f5..596bb5e4790c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -316,6 +316,7 @@ config DEBUG_INFO_BTF bool "Generate BTF typeinfo" depends on !DEBUG_INFO_SPLIT && !DEBUG_INFO_REDUCED depends on !GCC_PLUGIN_RANDSTRUCT || COMPILE_TEST + depends on BPF_SYSCALL help Generate deduplicated BTF type information from DWARF debug info. Turning this on expects presence of pahole tool, which will convert -- cgit From b12f031043247b80999bf5e03b8cded3b0b40f8d Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 22 Nov 2021 20:17:41 +0530 Subject: bpf: Fix bpf_check_mod_kfunc_call for built-in modules When module registering its set is built-in, THIS_MODULE will be NULL, hence we cannot return early in case owner is NULL. Fixes: 14f267d95fe4 ("bpf: btf: Introduce helpers for dynamic BTF set registration") Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Andrii Nakryiko Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20211122144742.477787-3-memxor@gmail.com --- kernel/bpf/btf.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index ea3df9867cec..9bdb03767db5 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6371,8 +6371,6 @@ bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id, { struct kfunc_btf_id_set *s; - if (!owner) - return false; mutex_lock(&klist->mutex); list_for_each_entry(s, &klist->list, list) { if (s->owner == owner && btf_id_set_contains(s->set, kfunc_id)) { -- cgit From 3345193f6f3cc24791c245d4ba2c38502f1cf684 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 22 Nov 2021 20:17:42 +0530 Subject: tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets resolve_btfids prints a warning when it finds an unresolved symbol, (id == 0) in id_patch. This can be the case for BTF sets that are empty (due to disabled config options), hence printing warnings for certain builds, most recently seen in [0]. The reason behind this is because id->cnt aliases id->id in btf_id struct, leading to empty set showing up as ID 0 when we get to id_patch, which triggers the warning. Since sets are an exception here, accomodate by reusing hole in btf_id for bool is_set member, setting it to true for BTF set when setting id->cnt, and use that to skip extraneous warning. [0]: https://lore.kernel.org/all/1b99ae14-abb4-d18f-cc6a-d7e523b25542@gmail.com Before: ; ./tools/bpf/resolve_btfids/resolve_btfids -v -b vmlinux net/ipv4/tcp_cubic.ko adding symbol tcp_cubic_kfunc_ids WARN: resolve_btfids: unresolved symbol tcp_cubic_kfunc_ids patching addr 0: ID 0 [tcp_cubic_kfunc_ids] sorting addr 4: cnt 0 [tcp_cubic_kfunc_ids] update ok for net/ipv4/tcp_cubic.ko After: ; ./tools/bpf/resolve_btfids/resolve_btfids -v -b vmlinux net/ipv4/tcp_cubic.ko adding symbol tcp_cubic_kfunc_ids patching addr 0: ID 0 [tcp_cubic_kfunc_ids] sorting addr 4: cnt 0 [tcp_cubic_kfunc_ids] update ok for net/ipv4/tcp_cubic.ko Fixes: 0e32dfc80bae ("bpf: Enable TCP congestion control kfunc from modules") Reported-by: Pavel Skripkin Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Andrii Nakryiko Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20211122144742.477787-4-memxor@gmail.com --- tools/bpf/resolve_btfids/main.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c index a59cb0ee609c..73409e27be01 100644 --- a/tools/bpf/resolve_btfids/main.c +++ b/tools/bpf/resolve_btfids/main.c @@ -83,6 +83,7 @@ struct btf_id { int cnt; }; int addr_cnt; + bool is_set; Elf64_Addr addr[ADDR_CNT]; }; @@ -451,8 +452,10 @@ static int symbols_collect(struct object *obj) * in symbol's size, together with 'cnt' field hence * that - 1. */ - if (id) + if (id) { id->cnt = sym.st_size / sizeof(int) - 1; + id->is_set = true; + } } else { pr_err("FAILED unsupported prefix %s\n", prefix); return -1; @@ -568,9 +571,8 @@ static int id_patch(struct object *obj, struct btf_id *id) int *ptr = data->d_buf; int i; - if (!id->id) { + if (!id->id && !id->is_set) pr_err("WARN: resolve_btfids: unresolved symbol %s\n", id->name); - } for (i = 0; i < id->addr_cnt; i++) { unsigned long addr = id->addr[i]; -- cgit From f6071e5e3961eeb5300bd0901c9e128598730ae3 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Tue, 30 Nov 2021 16:47:20 -0800 Subject: selftests/fib_tests: Rework fib_rp_filter_test() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are failing. ping sockets are bound to dummy1 using the "-I" option (SO_BINDTODEVICE), but socket lookup is failing when receiving ping replies, since the routing table thinks they belong to dummy0. For example, suppose ping is using a SOCK_RAW socket for ICMP messages. When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the raw_sk_bound_dev_eq() check fails. Similar things happen in ping_lookup() for SOCK_DGRAM sockets. These tests used to pass due to a bug [1] in iputils, where "ping -I" actually did not bind ICMP message sockets to device. The bug has been fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket to the specific device") in 2016, which is why our rp_filter tests started to fail. See [2] . Fixing the tests while keeping everything in one netns turns out to be nontrivial. Rework the tests and build the following topology: ┌─────────────────────────────┐ ┌─────────────────────────────┐ │ network namespace 1 (ns1) │ │ network namespace 2 (ns2) │ │ │ │ │ │ ┌────┐ ┌─────┐ │ │ ┌─────┐ ┌────┐ │ │ │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │ │ │ └────┘ ├─────┴──────┐ │ │ ├─────┴──────┐ └────┘ │ │ │192.0.2.1/24│ │ │ │192.0.2.1/24│ │ │ └────────────┘ │ │ └────────────┘ │ └─────────────────────────────┘ └─────────────────────────────┘ Consider sending an ICMP_ECHO packet A in ns2. Both source and destination IP addresses are 192.0.2.1, and we use strict mode rp_filter in both ns1 and ns2: 1. A is routed to lo since its destination IP address is one of ns2's local addresses (veth2); 2. A is redirected from lo's egress to veth2's egress using mirred; 3. A arrives at veth1's ingress in ns1; 4. A is redirected from veth1's ingress to lo's ingress, again, using mirred; 5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false, since A was received on lo, but reverse path lookup says veth1; 6. However A is not dropped since we have relaxed this check for lo in commit 66f8209547cc ("fib: relax source validation check for loopback packets"); Making sure A is not dropped here in this corner case is the whole point of having this test. 7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is generated; 8. Similarly, B is redirected from lo's egress to veth1's egress (in ns1), then redirected once again from veth2's ingress to lo's ingress (in ns2), using mirred. Also test "ping 127.0.0.1" from ns2. It does not trigger the relaxed check in __fib_validate_source(), but just to make sure the topology works with loopback addresses. Tested with ping from iputils 20210722-41-gf9fb573: $ ./fib_tests.sh -t rp_filter IPv4 rp_filter tests TEST: rp_filter passes local packets [ OK ] TEST: rp_filter passes loopback packets [ OK ] [1] https://github.com/iputils/iputils/issues/55 [2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1d Reported-by: Hangbin Liu Fixes: adb701d6cfa4 ("selftests: add a test case for rp_filter") Reviewed-by: Cong Wang Signed-off-by: Peilin Ye Acked-by: David Ahern Link: https://lore.kernel.org/r/20211201004720.6357-1-yepeilin.cs@gmail.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/fib_tests.sh | 59 ++++++++++++++++++++++++++------ 1 file changed, 49 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index 5abe92d55b69..996af1ae3d3d 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -444,24 +444,63 @@ fib_rp_filter_test() setup set -e + ip netns add ns2 + ip netns set ns2 auto + + ip -netns ns2 link set dev lo up + + $IP link add name veth1 type veth peer name veth2 + $IP link set dev veth2 netns ns2 + $IP address add 192.0.2.1/24 dev veth1 + ip -netns ns2 address add 192.0.2.1/24 dev veth2 + $IP link set dev veth1 up + ip -netns ns2 link set dev veth2 up + $IP link set dev lo address 52:54:00:6a:c7:5e - $IP link set dummy0 address 52:54:00:6a:c7:5e - $IP link add dummy1 type dummy - $IP link set dummy1 address 52:54:00:6a:c7:5e - $IP link set dev dummy1 up + $IP link set dev veth1 address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev lo address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev veth2 address 52:54:00:6a:c7:5e + + # 1. (ns2) redirect lo's egress to veth2's egress + ip netns exec ns2 tc qdisc add dev lo parent root handle 1: fq_codel + ip netns exec ns2 tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth2 + ip netns exec ns2 tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth2 + + # 2. (ns1) redirect veth1's ingress to lo's ingress + $NS_EXEC tc qdisc add dev veth1 ingress + $NS_EXEC tc filter add dev veth1 ingress protocol arp basic \ + action mirred ingress redirect dev lo + $NS_EXEC tc filter add dev veth1 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + # 3. (ns1) redirect lo's egress to veth1's egress + $NS_EXEC tc qdisc add dev lo parent root handle 1: fq_codel + $NS_EXEC tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth1 + $NS_EXEC tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth1 + + # 4. (ns2) redirect veth2's ingress to lo's ingress + ip netns exec ns2 tc qdisc add dev veth2 ingress + ip netns exec ns2 tc filter add dev veth2 ingress protocol arp basic \ + action mirred ingress redirect dev lo + ip netns exec ns2 tc filter add dev veth2 ingress protocol ip basic \ + action mirred ingress redirect dev lo + $NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1 $NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1 $NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1 - - $NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel - $NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo - $NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.accept_local=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.route_localnet=1 set +e - run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1" + run_cmd "ip netns exec ns2 ping -w1 -c1 192.0.2.1" log_test $? 0 "rp_filter passes local packets" - run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1" + run_cmd "ip netns exec ns2 ping -w1 -c1 127.0.0.1" log_test $? 0 "rp_filter passes loopback packets" cleanup -- cgit From 96f3896780153214040a6747974bebc1355307c0 Mon Sep 17 00:00:00 2001 From: Li Zhijian Date: Fri, 3 Dec 2021 10:53:21 +0800 Subject: selftests/tc-testing: add exit code Mark the summary result as FAIL to prevent from confusing the selftest framework if some of them are failed. Previously, the selftest framework always treats it as *ok* even though some of them are failed actually. That's because the script tdc.sh always return 0. # All test results: # # 1..97 # ok 1 83be - Create FQ-PIE with invalid number of flows # ok 2 8b6e - Create RED with no flags [...snip] # ok 6 5f15 - Create RED with flags ECN, harddrop # ok 7 53e8 - Create RED with flags ECN, nodrop # ok 8 d091 - Fail to create RED with only nodrop flag # ok 9 af8e - Create RED with flags ECN, nodrop, harddrop # not ok 10 ce7d - Add mq Qdisc to multi-queue device (4 queues) # Could not match regex pattern. Verify command output: # qdisc mq 1: root # qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 # qdisc fq_codel 0: parent 1:3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 [...snip] # ok 96 6979 - Change quantum of a strict ETS band # ok 97 9a7d - Change ETS strict band without quantum # # # # ok 1 selftests: tc-testing: tdc.sh <<< summary result CC: Philip Li Reported-by: kernel test robot Signed-off-by: Li Zhijian Acked-by: Davide Caratti Acked-by: Jamal Hadi Salim Signed-off-by: David S. Miller --- tools/testing/selftests/tc-testing/tdc.py | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/tc-testing/tdc.py b/tools/testing/selftests/tc-testing/tdc.py index a3e43189d940..ee22e3447ec7 100755 --- a/tools/testing/selftests/tc-testing/tdc.py +++ b/tools/testing/selftests/tc-testing/tdc.py @@ -716,6 +716,7 @@ def set_operation_mode(pm, parser, args, remaining): list_test_cases(alltests) exit(0) + exit_code = 0 # KSFT_PASS if len(alltests): req_plugins = pm.get_required_plugins(alltests) try: @@ -724,6 +725,8 @@ def set_operation_mode(pm, parser, args, remaining): print('The following plugins were not found:') print('{}'.format(pde.missing_pg)) catresults = test_runner(pm, args, alltests) + if catresults.count_failures() != 0: + exit_code = 1 # KSFT_FAIL if args.format == 'none': print('Test results output suppression requested\n') else: @@ -748,6 +751,8 @@ def set_operation_mode(pm, parser, args, remaining): gid=int(os.getenv('SUDO_GID'))) else: print('No tests found\n') + exit_code = 4 # KSFT_SKIP + exit(exit_code) def main(): """ @@ -767,8 +772,5 @@ def main(): set_operation_mode(pm, parser, args, remaining) - exit(0) - - if __name__ == "__main__": main() -- cgit From a8c9505c53c5f1f0aba572a4c70e2d91ad08434e Mon Sep 17 00:00:00 2001 From: Li Zhijian Date: Fri, 3 Dec 2021 10:53:22 +0800 Subject: selftests/tc-testing: add missing config qdiscs/fq_pie requires CONFIG_NET_SCH_FQ_PIE, otherwise tc will fail to create a fq_pie qdisc. It fixes following issue: # not ok 57 83be - Create FQ-PIE with invalid number of flows # Command exited with 2, expected 0 # Error: Specified qdisc not found. Signed-off-by: Li Zhijian Signed-off-by: David S. Miller --- tools/testing/selftests/tc-testing/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config index b71828df5a6d..b1cd7efa4512 100644 --- a/tools/testing/selftests/tc-testing/config +++ b/tools/testing/selftests/tc-testing/config @@ -60,6 +60,7 @@ CONFIG_NET_IFE_SKBTCINDEX=m CONFIG_NET_SCH_FIFO=y CONFIG_NET_SCH_ETS=m CONFIG_NET_SCH_RED=m +CONFIG_NET_SCH_FQ_PIE=m # ## Network testing -- cgit From db925bca33a9f0029a9891defd926a4856dd5c87 Mon Sep 17 00:00:00 2001 From: Li Zhijian Date: Fri, 3 Dec 2021 10:53:23 +0800 Subject: selftests/tc-testing: Fix cannot create /sys/bus/netdevsim/new_device: Directory nonexistent Install netdevsim to provide /sys/bus/netdevsim/new_device interface. It helps to fix: # ok 97 9a7d - Change ETS strict band without quantum # skipped - skipped - previous setup failed 11 ce7d # # # -----> prepare stage *** Could not execute: "echo "1 1 4" > /sys/bus/netdevsim/new_device" # # -----> prepare stage *** Error message: "/bin/sh: 1: cannot create /sys/bus/netdevsim/new_device: Directory nonexistent # " # # -----> prepare stage *** Aborting test run. # # # <_io.BufferedReader name=5> *** stdout *** # Signed-off-by: Li Zhijian Signed-off-by: David S. Miller --- tools/testing/selftests/tc-testing/config | 1 + tools/testing/selftests/tc-testing/tdc.sh | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config index b1cd7efa4512..a3239d5e40c7 100644 --- a/tools/testing/selftests/tc-testing/config +++ b/tools/testing/selftests/tc-testing/config @@ -61,6 +61,7 @@ CONFIG_NET_SCH_FIFO=y CONFIG_NET_SCH_ETS=m CONFIG_NET_SCH_RED=m CONFIG_NET_SCH_FQ_PIE=m +CONFIG_NETDEVSIM=m # ## Network testing diff --git a/tools/testing/selftests/tc-testing/tdc.sh b/tools/testing/selftests/tc-testing/tdc.sh index 7fe38c76db44..afb0cd86fa3d 100755 --- a/tools/testing/selftests/tc-testing/tdc.sh +++ b/tools/testing/selftests/tc-testing/tdc.sh @@ -1,5 +1,6 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 +modprobe netdevsim ./tdc.py -c actions --nobuildebpf ./tdc.py -c qdisc -- cgit From a9418924552e52e63903cbb0310d7537260702bf Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Dec 2021 14:42:18 -0800 Subject: inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently Since commit 4e1beecc3b58 ("net/sock: Add kernel config SOCK_RX_QUEUE_MAPPING"), sk_rx_queue_mapping access is guarded by CONFIG_SOCK_RX_QUEUE_MAPPING. Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") Signed-off-by: Eric Dumazet Cc: Kuniyuki Iwashima Cc: Daniel Borkmann Cc: Martin KaFai Lau Cc: Tariq Toukan Acked-by: Kuniyuki Iwashima Reviewed-by: Tariq Toukan Signed-off-by: David S. Miller --- net/ipv4/inet_connection_sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index f7fea3a7c5e6..62a67fdc344c 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -721,7 +721,7 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req, sk_node_init(&nreq_sk->sk_node); nreq_sk->sk_tx_queue_mapping = req_sk->sk_tx_queue_mapping; -#ifdef CONFIG_XPS +#ifdef CONFIG_SOCK_RX_QUEUE_MAPPING nreq_sk->sk_rx_queue_mapping = req_sk->sk_rx_queue_mapping; #endif nreq_sk->sk_incoming_cpu = req_sk->sk_incoming_cpu; -- cgit From 03cfda4fa6ea9bea2f30160579a78c2b8c1e616e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Dec 2021 15:37:24 -0800 Subject: tcp: fix another uninit-value (sk_rx_queue_mapping) KMSAN is still not happy [1]. I missed that passive connections do not inherit their sk_rx_queue_mapping values from the request socket, but instead tcp_child_process() is calling sk_mark_napi_id(child, skb) We have many sk_mark_napi_id() callers, so I am providing a new helper, forcing the setting sk_rx_queue_mapping and sk_napi_id. Note that we had no KMSAN report for sk_napi_id because passive connections got a copy of this field from the listener. sk_rx_queue_mapping in the other hand is inside the sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock() leaves this field uninitialized. We might remove dead code populating req->sk_rx_queue_mapping in the future. [1] BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline] BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline] BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline] BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833 __sk_rx_queue_set include/net/sock.h:1924 [inline] sk_rx_queue_update include/net/sock.h:1938 [inline] sk_mark_napi_id include/net/busy_poll.h:136 [inline] tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833 tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066 ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline] ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609 ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline] __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553 __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605 netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696 gro_normal_list net/core/dev.c:5850 [inline] napi_complete_done+0x579/0xdd0 net/core/dev.c:6587 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557 __napi_poll+0x14e/0xbc0 net/core/dev.c:7020 napi_poll net/core/dev.c:7087 [inline] net_rx_action+0x824/0x1880 net/core/dev.c:7174 __do_softirq+0x1fe/0x7eb kernel/softirq.c:558 run_ksoftirqd+0x33/0x50 kernel/softirq.c:920 smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164 kthread+0x721/0x850 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 Uninit was created at: __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409 alloc_pages+0x8a5/0xb80 alloc_slab_page mm/slub.c:1810 [inline] allocate_slab+0x287/0x1c20 mm/slub.c:1947 new_slab mm/slub.c:2010 [inline] ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039 __slab_alloc mm/slub.c:3126 [inline] slab_alloc_node mm/slub.c:3217 [inline] slab_alloc mm/slub.c:3259 [inline] kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264 sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914 sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118 inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956 tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453 tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563 tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765 tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047 ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline] ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609 ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline] __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553 __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605 netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696 gro_normal_list net/core/dev.c:5850 [inline] napi_complete_done+0x579/0xdd0 net/core/dev.c:6587 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557 __napi_poll+0x14e/0xbc0 net/core/dev.c:7020 napi_poll net/core/dev.c:7087 [inline] net_rx_action+0x824/0x1880 net/core/dev.c:7174 __do_softirq+0x1fe/0x7eb kernel/softirq.c:558 Fixes: 342159ee394d ("net: avoid dirtying sk->sk_rx_queue_mapping") Fixes: a37a0ee4d25c ("net: avoid uninit-value from tcp_conn_request") Signed-off-by: Eric Dumazet Reported-by: syzbot Tested-by: Alexander Potapenko Signed-off-by: David S. Miller --- include/net/busy_poll.h | 13 +++++++++++++ net/ipv4/tcp_minisocks.c | 4 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 7994455ec714..c4898fcbf923 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -136,6 +136,19 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb) sk_rx_queue_update(sk, skb); } +/* Variant of sk_mark_napi_id() for passive flow setup, + * as sk->sk_napi_id and sk->sk_rx_queue_mapping content + * needs to be set. + */ +static inline void sk_mark_napi_id_set(struct sock *sk, + const struct sk_buff *skb) +{ +#ifdef CONFIG_NET_RX_BUSY_POLL + WRITE_ONCE(sk->sk_napi_id, skb->napi_id); +#endif + sk_rx_queue_set(sk, skb); +} + static inline void __sk_mark_napi_id_once(struct sock *sk, unsigned int napi_id) { #ifdef CONFIG_NET_RX_BUSY_POLL diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index cf913a66df17..7c2d3ac2363a 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -829,8 +829,8 @@ int tcp_child_process(struct sock *parent, struct sock *child, int ret = 0; int state = child->sk_state; - /* record NAPI ID of child */ - sk_mark_napi_id(child, skb); + /* record sk_napi_id and sk_rx_queue_mapping of child. */ + sk_mark_napi_id_set(child, skb); tcp_segs_in(tcp_sk(child), skb); if (!sock_owned_by_user(child)) { -- cgit From dac8e00fb640e9569cdeefd3ce8a75639e5d0711 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Dec 2021 18:27:18 -0800 Subject: bonding: make tx_rebalance_counter an atomic KCSAN reported a data-race [1] around tx_rebalance_counter which can be accessed from different contexts, without the protection of a lock/mutex. [1] BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0: bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613 bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949 do_set_master net/core/rtnetlink.c:2521 [inline] __rtnl_newlink net/core/rtnetlink.c:3475 [inline] rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2492 __do_sys_sendmsg net/socket.c:2501 [inline] __se_sys_sendmsg net/socket.c:2499 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1: bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298 worker_thread+0x616/0xa70 kernel/workqueue.c:2445 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x00000001 -> 0x00000064 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: bond1 bond_alb_monitor Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- drivers/net/bonding/bond_alb.c | 14 ++++++++------ include/net/bond_alb.h | 2 +- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 2ec8e015c7b3..533e476988f2 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1501,14 +1501,14 @@ void bond_alb_monitor(struct work_struct *work) struct slave *slave; if (!bond_has_slaves(bond)) { - bond_info->tx_rebalance_counter = 0; + atomic_set(&bond_info->tx_rebalance_counter, 0); bond_info->lp_counter = 0; goto re_arm; } rcu_read_lock(); - bond_info->tx_rebalance_counter++; + atomic_inc(&bond_info->tx_rebalance_counter); bond_info->lp_counter++; /* send learning packets */ @@ -1530,7 +1530,7 @@ void bond_alb_monitor(struct work_struct *work) } /* rebalance tx traffic */ - if (bond_info->tx_rebalance_counter >= BOND_TLB_REBALANCE_TICKS) { + if (atomic_read(&bond_info->tx_rebalance_counter) >= BOND_TLB_REBALANCE_TICKS) { bond_for_each_slave_rcu(bond, slave, iter) { tlb_clear_slave(bond, slave, 1); if (slave == rcu_access_pointer(bond->curr_active_slave)) { @@ -1540,7 +1540,7 @@ void bond_alb_monitor(struct work_struct *work) bond_info->unbalanced_load = 0; } } - bond_info->tx_rebalance_counter = 0; + atomic_set(&bond_info->tx_rebalance_counter, 0); } if (bond_info->rlb_enabled) { @@ -1610,7 +1610,8 @@ int bond_alb_init_slave(struct bonding *bond, struct slave *slave) tlb_init_slave(slave); /* order a rebalance ASAP */ - bond->alb_info.tx_rebalance_counter = BOND_TLB_REBALANCE_TICKS; + atomic_set(&bond->alb_info.tx_rebalance_counter, + BOND_TLB_REBALANCE_TICKS); if (bond->alb_info.rlb_enabled) bond->alb_info.rlb_rebalance = 1; @@ -1647,7 +1648,8 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char rlb_clear_slave(bond, slave); } else if (link == BOND_LINK_UP) { /* order a rebalance ASAP */ - bond_info->tx_rebalance_counter = BOND_TLB_REBALANCE_TICKS; + atomic_set(&bond_info->tx_rebalance_counter, + BOND_TLB_REBALANCE_TICKS); if (bond->alb_info.rlb_enabled) { bond->alb_info.rlb_rebalance = 1; /* If the updelay module parameter is smaller than the diff --git a/include/net/bond_alb.h b/include/net/bond_alb.h index f6af76c87a6c..191c36afa1f4 100644 --- a/include/net/bond_alb.h +++ b/include/net/bond_alb.h @@ -126,7 +126,7 @@ struct tlb_slave_info { struct alb_bond_info { struct tlb_client_info *tx_hashtbl; /* Dynamically allocated */ u32 unbalanced_load; - int tx_rebalance_counter; + atomic_t tx_rebalance_counter; int lp_counter; /* -------- rlb parameters -------- */ int rlb_enabled; -- cgit From 0f8a3b48f91b8dc1f3eff06b77a63a17183fccbd Mon Sep 17 00:00:00 2001 From: Li Zhijian Date: Fri, 3 Dec 2021 10:32:13 +0800 Subject: selftests: net/fcnal-test.sh: add exit code Previously, the selftest framework always treats it as *ok* even though some of them are failed actually. That's because the script always returns 0. It supports PASS/FAIL/SKIP exit code now. CC: Philip Li Reported-by: kernel test robot Signed-off-by: Li Zhijian Signed-off-by: David S. Miller --- tools/testing/selftests/net/fcnal-test.sh | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 7f5b265fcb90..a1da013d847b 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -4077,3 +4077,11 @@ cleanup 2>/dev/null printf "\nTests passed: %3d\n" ${nsuccess} printf "Tests failed: %3d\n" ${nfail} + +if [ $nfail -ne 0 ]; then + exit 1 # KSFT_FAIL +elif [ $nsuccess -eq 0 ]; then + exit $ksft_skip +fi + +exit 0 # KSFT_PASS -- cgit From 128f6ec95a282b2d8bc1041e59bf65810703fa44 Mon Sep 17 00:00:00 2001 From: Jiasheng Jiang Date: Fri, 3 Dec 2021 11:31:06 +0800 Subject: net: bcm4908: Handle dma_set_coherent_mask error codes The return value of dma_set_coherent_mask() is not always 0. To catch the exception in case that dma is not support the mask. Fixes: 9d61d138ab30 ("net: broadcom: rename BCM4908 driver & update DT binding") Signed-off-by: Jiasheng Jiang Acked-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bcm4908_enet.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bcm4908_enet.c b/drivers/net/ethernet/broadcom/bcm4908_enet.c index 7cc5213c575a..b07cb9bc5f2d 100644 --- a/drivers/net/ethernet/broadcom/bcm4908_enet.c +++ b/drivers/net/ethernet/broadcom/bcm4908_enet.c @@ -708,7 +708,9 @@ static int bcm4908_enet_probe(struct platform_device *pdev) enet->irq_tx = platform_get_irq_byname(pdev, "tx"); - dma_set_coherent_mask(dev, DMA_BIT_MASK(32)); + err = dma_set_coherent_mask(dev, DMA_BIT_MASK(32)); + if (err) + return err; err = bcm4908_enet_dma_alloc(enet); if (err) -- cgit From badd7857f5c933a3dc34942a2c11d67fdbdc24de Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 3 Dec 2021 13:11:28 +0300 Subject: net: altera: set a couple error code in probe() There are two error paths which accidentally return success instead of a negative error code. Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller --- drivers/net/ethernet/altera/altera_tse_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c index d75d95a97dd9..993b2fb42961 100644 --- a/drivers/net/ethernet/altera/altera_tse_main.c +++ b/drivers/net/ethernet/altera/altera_tse_main.c @@ -1430,16 +1430,19 @@ static int altera_tse_probe(struct platform_device *pdev) priv->rxdescmem_busaddr = dma_res->start; } else { + ret = -ENODEV; goto err_free_netdev; } - if (!dma_set_mask(priv->device, DMA_BIT_MASK(priv->dmaops->dmamask))) + if (!dma_set_mask(priv->device, DMA_BIT_MASK(priv->dmaops->dmamask))) { dma_set_coherent_mask(priv->device, DMA_BIT_MASK(priv->dmaops->dmamask)); - else if (!dma_set_mask(priv->device, DMA_BIT_MASK(32))) + } else if (!dma_set_mask(priv->device, DMA_BIT_MASK(32))) { dma_set_coherent_mask(priv->device, DMA_BIT_MASK(32)); - else + } else { + ret = -EIO; goto err_free_netdev; + } /* MAC address space */ ret = request_and_map(pdev, "control_port", &control_port, -- cgit From 8581fd402a0cf80b5298e3b225e7a7bd8f110e69 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 2 Dec 2021 12:34:00 -0800 Subject: treewide: Add missing includes masked by cgroup -> bpf dependency MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit cgroup.h (therefore swap.h, therefore half of the universe) includes bpf.h which in turn includes module.h and slab.h. Since we're about to get rid of that dependency we need to clean things up. v2: drop the cpu.h include from cacheinfo.h, it's not necessary and it makes riscv sensitive to ordering of include files. Signed-off-by: Jakub Kicinski Signed-off-by: Alexei Starovoitov Reviewed-by: Christoph Hellwig Acked-by: Krzysztof Wilczyński Acked-by: Peter Chen Acked-by: SeongJae Park Acked-by: Jani Nikula Acked-by: Greg Kroah-Hartman Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/ # v1 Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org --- block/fops.c | 1 + drivers/gpu/drm/drm_gem_shmem_helper.c | 1 + drivers/gpu/drm/i915/gt/intel_gtt.c | 1 + drivers/gpu/drm/i915/i915_request.c | 1 + drivers/gpu/drm/lima/lima_device.c | 1 + drivers/gpu/drm/msm/msm_gem_shrinker.c | 1 + drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/net/ethernet/huawei/hinic/hinic_sriov.c | 1 + drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c | 2 ++ drivers/pci/controller/dwc/pci-exynos.c | 1 + drivers/pci/controller/dwc/pcie-qcom-ep.c | 1 + drivers/usb/cdns3/host.c | 1 + include/linux/cacheinfo.h | 1 - include/linux/device/driver.h | 1 + include/linux/filter.h | 2 +- mm/damon/vaddr.c | 1 + mm/memory_hotplug.c | 1 + mm/swap_slots.c | 1 + 18 files changed, 18 insertions(+), 2 deletions(-) diff --git a/block/fops.c b/block/fops.c index ad732a36f9b3..3cb1e81929bc 100644 --- a/block/fops.c +++ b/block/fops.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "blk.h" static inline struct inode *bdev_file_inode(struct file *file) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 7b9f69f21f1e..bca0de92802e 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -9,6 +9,7 @@ #include #include #include +#include #ifdef CONFIG_X86 #include diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 67d14afa6623..b67f620c3d93 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -6,6 +6,7 @@ #include /* fault-inject.h is not standalone! */ #include +#include #include "gem/i915_gem_lmem.h" #include "i915_trace.h" diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 820a1f38b271..89cccefeea63 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "gem/i915_gem_context.h" #include "gt/intel_breadcrumbs.h" diff --git a/drivers/gpu/drm/lima/lima_device.c b/drivers/gpu/drm/lima/lima_device.c index 65fdca366e41..f74f8048af8f 100644 --- a/drivers/gpu/drm/lima/lima_device.c +++ b/drivers/gpu/drm/lima/lima_device.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c index 4a1420b05e97..086dacf2f26a 100644 --- a/drivers/gpu/drm/msm/msm_gem_shrinker.c +++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c @@ -5,6 +5,7 @@ */ #include +#include #include "msm_drv.h" #include "msm_gem.h" diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 7e83c00a3f48..79c870a3bef8 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include diff --git a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c index a78c398bf5b2..01e7d3c0b68e 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "hinic_hw_dev.h" #include "hinic_dev.h" diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c index 0ef68fdd1f26..61c20907315f 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c @@ -5,6 +5,8 @@ * */ +#include + #include "otx2_common.h" #include "otx2_ptp.h" diff --git a/drivers/pci/controller/dwc/pci-exynos.c b/drivers/pci/controller/dwc/pci-exynos.c index c24dab383654..722dacdd5a17 100644 --- a/drivers/pci/controller/dwc/pci-exynos.c +++ b/drivers/pci/controller/dwc/pci-exynos.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "pcie-designware.h" diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c index 7b17da2f9b3f..cfe66bf04c1d 100644 --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "pcie-designware.h" diff --git a/drivers/usb/cdns3/host.c b/drivers/usb/cdns3/host.c index 84dadfa726aa..9643b905e2d8 100644 --- a/drivers/usb/cdns3/host.c +++ b/drivers/usb/cdns3/host.c @@ -10,6 +10,7 @@ */ #include +#include #include "core.h" #include "drd.h" #include "host-export.h" diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h index 2f909ed084c6..4ff37cb763ae 100644 --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -3,7 +3,6 @@ #define _LINUX_CACHEINFO_H #include -#include #include #include diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index a498ebcf4993..15e7c5e15d62 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -18,6 +18,7 @@ #include #include #include +#include /** * enum probe_type - device driver probe type to try diff --git a/include/linux/filter.h b/include/linux/filter.h index 534f678ca50f..7f1e88e3e2b5 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -6,6 +6,7 @@ #define __LINUX_FILTER_H__ #include +#include #include #include #include @@ -26,7 +27,6 @@ #include #include -#include struct sk_buff; struct sock; diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 35fe49080ee9..47f47f60440e 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "prmtv-common.h" diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 852041f6be41..2a9627dc784c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -35,6 +35,7 @@ #include #include #include +#include #include diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 16f706c55d92..2b5531840583 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include -- cgit From 2fa7d94afc1afbb4d702760c058dc2d7ed30f226 Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Tue, 30 Nov 2021 20:16:07 +0200 Subject: bpf: Fix the off-by-two error in range markings The first commit cited below attempts to fix the off-by-one error that appeared in some comparisons with an open range. Due to this error, arithmetically equivalent pieces of code could get different verdicts from the verifier, for example (pseudocode): // 1. Passes the verifier: if (data + 8 > data_end) return early read *(u64 *)data, i.e. [data; data+7] // 2. Rejected by the verifier (should still pass): if (data + 7 >= data_end) return early read *(u64 *)data, i.e. [data; data+7] The attempted fix, however, shifts the range by one in a wrong direction, so the bug not only remains, but also such piece of code starts failing in the verifier: // 3. Rejected by the verifier, but the check is stricter than in #1. if (data + 8 >= data_end) return early read *(u64 *)data, i.e. [data; data+7] The change performed by that fix converted an off-by-one bug into off-by-two. The second commit cited below added the BPF selftests written to ensure than code chunks like #3 are rejected, however, they should be accepted. This commit fixes the off-by-two error by adjusting new_range in the right direction and fixes the tests by changing the range into the one that should actually fail. Fixes: fb2a311a31d3 ("bpf: fix off by one for range markings with L{T, E} patterns") Fixes: b37242c773b2 ("bpf: add test cases to bpf selftests to cover all access tests") Signed-off-by: Maxim Mikityanskiy Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211130181607.593149-1-maximmi@nvidia.com --- kernel/bpf/verifier.c | 2 +- .../bpf/verifier/xdp_direct_packet_access.c | 32 +++++++++++----------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 50efda51515b..f3001937bbb9 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8422,7 +8422,7 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *vstate, new_range = dst_reg->off; if (range_right_open) - new_range--; + new_range++; /* Examples for register markings: * diff --git a/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c b/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c index bfb97383e6b5..de172a5b8754 100644 --- a/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c +++ b/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c @@ -112,10 +112,10 @@ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -167,10 +167,10 @@ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -274,9 +274,9 @@ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -437,9 +437,9 @@ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -544,10 +544,10 @@ offsetof(struct xdp_md, data_meta)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -599,10 +599,10 @@ offsetof(struct xdp_md, data_meta)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -706,9 +706,9 @@ offsetof(struct xdp_md, data_meta)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -869,9 +869,9 @@ offsetof(struct xdp_md, data_meta)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, -- cgit From 8e227b198a55859bf790dc7f4b1e30c0859c6756 Mon Sep 17 00:00:00 2001 From: Manish Chopra Date: Fri, 3 Dec 2021 09:44:13 -0800 Subject: qede: validate non LSO skb length Although it is unlikely that stack could transmit a non LSO skb with length > MTU, however in some cases or environment such occurrences actually resulted into firmware asserts due to packet length being greater than the max supported by the device (~9700B). This patch adds the safeguard for such odd cases to avoid firmware asserts. v2: Added "Fixes" tag with one of the initial driver commit which enabled the TX traffic actually (as this was probably day1 issue which was discovered recently by some customer environment) Fixes: a2ec6172d29c ("qede: Add support for link") Signed-off-by: Manish Chopra Signed-off-by: Alok Prasad Signed-off-by: Prabhakar Kushwaha Signed-off-by: Ariel Elior Link: https://lore.kernel.org/r/20211203174413.13090-1-manishc@marvell.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/qlogic/qede/qede_fp.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index 065e9004598e..999abcfe3310 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -1643,6 +1643,13 @@ netdev_tx_t qede_start_xmit(struct sk_buff *skb, struct net_device *ndev) data_split = true; } } else { + if (unlikely(skb->len > ETH_TX_MAX_NON_LSO_PKT_LEN)) { + DP_ERR(edev, "Unexpected non LSO skb length = 0x%x\n", skb->len); + qede_free_failed_tx_pkt(txq, first_bd, 0, false); + qede_update_tx_producer(txq); + return NETDEV_TX_OK; + } + val |= ((skb->len & ETH_TX_DATA_1ST_BD_PKT_LEN_MASK) << ETH_TX_DATA_1ST_BD_PKT_LEN_SHIFT); } -- cgit From 2be6d4d16a0849455a5c22490e3c5983495fed00 Mon Sep 17 00:00:00 2001 From: Lee Jones Date: Thu, 2 Dec 2021 14:34:37 +0000 Subject: net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, due to the sequential use of min_t() and clamp_t() macros, in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic sets tx_max to 0. This is then used to allocate the data area of the SKB requested later in cdc_ncm_fill_tx_frame(). This does not cause an issue presently because when memory is allocated during initialisation phase of SKB creation, more memory (512b) is allocated than is required for the SKB headers alone (320b), leaving some space (512b - 320b = 192b) for CDC data (172b). However, if more elements (for example 3 x u64 = [24b]) were added to one of the SKB header structs, say 'struct skb_shared_info', increasing its original size (320b [320b aligned]) to something larger (344b [384b aligned]), then suddenly the CDC data (172b) no longer fits in the spare SKB data area (512b - 384b = 128b). Consequently the SKB bounds checking semantics fails and panics: skbuff: skb_over_panic: text:ffffffff830a5b5f len:184 put:172 \ head:ffff888119227c00 data:ffff888119227c00 tail:0xb8 end:0x80 dev: ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:110! RIP: 0010:skb_panic+0x14f/0x160 net/core/skbuff.c:106 Call Trace: skb_over_panic+0x2c/0x30 net/core/skbuff.c:115 skb_put+0x205/0x210 net/core/skbuff.c:1877 skb_put_zero include/linux/skbuff.h:2270 [inline] cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1116 [inline] cdc_ncm_fill_tx_frame+0x127f/0x3d50 drivers/net/usb/cdc_ncm.c:1293 cdc_ncm_tx_fixup+0x98/0xf0 drivers/net/usb/cdc_ncm.c:1514 By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX when not offered through the system provided params, we ensure enough data space is allocated to handle the CDC data, meaning no crash will occur. Cc: Oliver Neukum Fixes: 289507d3364f9 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning") Signed-off-by: Lee Jones Reviewed-by: Bjørn Mork Link: https://lore.kernel.org/r/20211202143437.1411410-1-lee.jones@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/usb/cdc_ncm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 24753a4da7e6..e303b522efb5 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -181,6 +181,8 @@ static u32 cdc_ncm_check_tx_max(struct usbnet *dev, u32 new_tx) min = ctx->max_datagram_size + ctx->max_ndp_size + sizeof(struct usb_cdc_ncm_nth32); max = min_t(u32, CDC_NCM_NTB_MAX_SIZE_TX, le32_to_cpu(ctx->ncm_parm.dwNtbOutMaxSize)); + if (max == 0) + max = CDC_NCM_NTB_MAX_SIZE_TX; /* dwNtbOutMaxSize not set */ /* some devices set dwNtbOutMaxSize too low for the above default */ min = min(min, max); -- cgit From 1a1aa356ddf3f16539f5962c01c5f702686dfc15 Mon Sep 17 00:00:00 2001 From: Michal Maloszewski Date: Tue, 26 Oct 2021 12:59:09 +0000 Subject: iavf: Fix reporting when setting descriptor count iavf_set_ringparams doesn't communicate to the user that 1. The user requested descriptor count is out of range. Instead it just quietly sets descriptors to the "clamped" value and calls it done. This makes it look an invalid value was successfully set as the descriptor count when this isn't actually true. 2. The user provided descriptor count needs to be inflated for alignment reasons. This behavior is confusing. The ice driver has already addressed this by rejecting invalid values for descriptor count and messaging for alignment adjustments. Do the same thing here by adding the error and info messages. Fixes: fbb7ddfef253 ("i40evf: core ethtool functionality") Signed-off-by: Anirudh Venkataramanan Signed-off-by: Michal Maloszewski Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 43 +++++++++++++++++++------- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index 0cecaff38d04..461f5237a2f8 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -615,23 +615,44 @@ static int iavf_set_ringparam(struct net_device *netdev, if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending)) return -EINVAL; - new_tx_count = clamp_t(u32, ring->tx_pending, - IAVF_MIN_TXD, - IAVF_MAX_TXD); - new_tx_count = ALIGN(new_tx_count, IAVF_REQ_DESCRIPTOR_MULTIPLE); + if (ring->tx_pending > IAVF_MAX_TXD || + ring->tx_pending < IAVF_MIN_TXD || + ring->rx_pending > IAVF_MAX_RXD || + ring->rx_pending < IAVF_MIN_RXD) { + netdev_err(netdev, "Descriptors requested (Tx: %d / Rx: %d) out of range [%d-%d] (increment %d)\n", + ring->tx_pending, ring->rx_pending, IAVF_MIN_TXD, + IAVF_MAX_RXD, IAVF_REQ_DESCRIPTOR_MULTIPLE); + return -EINVAL; + } - new_rx_count = clamp_t(u32, ring->rx_pending, - IAVF_MIN_RXD, - IAVF_MAX_RXD); - new_rx_count = ALIGN(new_rx_count, IAVF_REQ_DESCRIPTOR_MULTIPLE); + new_tx_count = ALIGN(ring->tx_pending, IAVF_REQ_DESCRIPTOR_MULTIPLE); + if (new_tx_count != ring->tx_pending) + netdev_info(netdev, "Requested Tx descriptor count rounded up to %d\n", + new_tx_count); + + new_rx_count = ALIGN(ring->rx_pending, IAVF_REQ_DESCRIPTOR_MULTIPLE); + if (new_rx_count != ring->rx_pending) + netdev_info(netdev, "Requested Rx descriptor count rounded up to %d\n", + new_rx_count); /* if nothing to do return success */ if ((new_tx_count == adapter->tx_desc_count) && - (new_rx_count == adapter->rx_desc_count)) + (new_rx_count == adapter->rx_desc_count)) { + netdev_dbg(netdev, "Nothing to change, descriptor count is same as requested\n"); return 0; + } - adapter->tx_desc_count = new_tx_count; - adapter->rx_desc_count = new_rx_count; + if (new_tx_count != adapter->tx_desc_count) { + netdev_dbg(netdev, "Changing Tx descriptor count from %d to %d\n", + adapter->tx_desc_count, new_tx_count); + adapter->tx_desc_count = new_tx_count; + } + + if (new_rx_count != adapter->rx_desc_count) { + netdev_dbg(netdev, "Changing Rx descriptor count from %d to %d\n", + adapter->rx_desc_count, new_rx_count); + adapter->rx_desc_count = new_rx_count; + } if (netif_running(netdev)) { adapter->flags |= IAVF_FLAG_RESET_NEEDED; -- cgit From 61125b8be85dfbc7e9c7fe1cc6c6d631ab603516 Mon Sep 17 00:00:00 2001 From: Karen Sornek Date: Fri, 14 May 2021 11:43:13 +0200 Subject: i40e: Fix failed opcode appearing if handling messages from VF Fix failed operation code appearing if handling messages from VF. Implemented by waiting for VF appropriate state if request starts handle while VF reset. Without this patch the message handling request while VF is in a reset state ends with error -5 (I40E_ERR_PARAM). Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek Signed-off-by: Karen Sornek Tested-by: Tony Brelinski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 70 +++++++++++++++------- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 2 + 2 files changed, 50 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 80ae264c99ba..f651861442c2 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1948,6 +1948,32 @@ static int i40e_vc_send_resp_to_vf(struct i40e_vf *vf, return i40e_vc_send_msg_to_vf(vf, opcode, retval, NULL, 0); } +/** + * i40e_sync_vf_state + * @vf: pointer to the VF info + * @state: VF state + * + * Called from a VF message to synchronize the service with a potential + * VF reset state + **/ +static bool i40e_sync_vf_state(struct i40e_vf *vf, enum i40e_vf_states state) +{ + int i; + + /* When handling some messages, it needs VF state to be set. + * It is possible that this flag is cleared during VF reset, + * so there is a need to wait until the end of the reset to + * handle the request message correctly. + */ + for (i = 0; i < I40E_VF_STATE_WAIT_COUNT; i++) { + if (test_bit(state, &vf->vf_states)) + return true; + usleep_range(10000, 20000); + } + + return test_bit(state, &vf->vf_states); +} + /** * i40e_vc_get_version_msg * @vf: pointer to the VF info @@ -2008,7 +2034,7 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg) size_t len = 0; int ret; - if (!test_bit(I40E_VF_STATE_INIT, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_INIT)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -2131,7 +2157,7 @@ static int i40e_vc_config_promiscuous_mode_msg(struct i40e_vf *vf, u8 *msg) bool allmulti = false; bool alluni = false; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err_out; } @@ -2219,7 +2245,7 @@ static int i40e_vc_config_queues_msg(struct i40e_vf *vf, u8 *msg) struct i40e_vsi *vsi; u16 num_qps_all = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto error_param; } @@ -2368,7 +2394,7 @@ static int i40e_vc_config_irq_map_msg(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; int i; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto error_param; } @@ -2540,7 +2566,7 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg) struct i40e_pf *pf = vf->pf; i40e_status aq_ret = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto error_param; } @@ -2590,7 +2616,7 @@ static int i40e_vc_request_queues_msg(struct i40e_vf *vf, u8 *msg) u8 cur_pairs = vf->num_queue_pairs; struct i40e_pf *pf = vf->pf; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) return -EINVAL; if (req_pairs > I40E_MAX_VF_QUEUES) { @@ -2635,7 +2661,7 @@ static int i40e_vc_get_stats_msg(struct i40e_vf *vf, u8 *msg) memset(&stats, 0, sizeof(struct i40e_eth_stats)); - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto error_param; } @@ -2752,7 +2778,7 @@ static int i40e_vc_add_mac_addr_msg(struct i40e_vf *vf, u8 *msg) i40e_status ret = 0; int i; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states) || + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE) || !i40e_vc_isvalid_vsi_id(vf, al->vsi_id)) { ret = I40E_ERR_PARAM; goto error_param; @@ -2824,7 +2850,7 @@ static int i40e_vc_del_mac_addr_msg(struct i40e_vf *vf, u8 *msg) i40e_status ret = 0; int i; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states) || + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE) || !i40e_vc_isvalid_vsi_id(vf, al->vsi_id)) { ret = I40E_ERR_PARAM; goto error_param; @@ -2968,7 +2994,7 @@ static int i40e_vc_remove_vlan_msg(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; int i; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states) || + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE) || !i40e_vc_isvalid_vsi_id(vf, vfl->vsi_id)) { aq_ret = I40E_ERR_PARAM; goto error_param; @@ -3088,9 +3114,9 @@ static int i40e_vc_config_rss_key(struct i40e_vf *vf, u8 *msg) struct i40e_vsi *vsi = NULL; i40e_status aq_ret = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states) || + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE) || !i40e_vc_isvalid_vsi_id(vf, vrk->vsi_id) || - (vrk->key_len != I40E_HKEY_ARRAY_SIZE)) { + vrk->key_len != I40E_HKEY_ARRAY_SIZE) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3119,9 +3145,9 @@ static int i40e_vc_config_rss_lut(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; u16 i; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states) || + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE) || !i40e_vc_isvalid_vsi_id(vf, vrl->vsi_id) || - (vrl->lut_entries != I40E_VF_HLUT_ARRAY_SIZE)) { + vrl->lut_entries != I40E_VF_HLUT_ARRAY_SIZE) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3154,7 +3180,7 @@ static int i40e_vc_get_rss_hena(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; int len = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3190,7 +3216,7 @@ static int i40e_vc_set_rss_hena(struct i40e_vf *vf, u8 *msg) struct i40e_hw *hw = &pf->hw; i40e_status aq_ret = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3215,7 +3241,7 @@ static int i40e_vc_enable_vlan_stripping(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; struct i40e_vsi *vsi; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3241,7 +3267,7 @@ static int i40e_vc_disable_vlan_stripping(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; struct i40e_vsi *vsi; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3468,7 +3494,7 @@ static int i40e_vc_del_cloud_filter(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; int i, ret; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3599,7 +3625,7 @@ static int i40e_vc_add_cloud_filter(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; int i, ret; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err_out; } @@ -3708,7 +3734,7 @@ static int i40e_vc_add_qch_msg(struct i40e_vf *vf, u8 *msg) i40e_status aq_ret = 0; u64 speed = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } @@ -3824,7 +3850,7 @@ static int i40e_vc_del_qch_msg(struct i40e_vf *vf, u8 *msg) struct i40e_pf *pf = vf->pf; i40e_status aq_ret = 0; - if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) { + if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) { aq_ret = I40E_ERR_PARAM; goto err; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h index 091e32c1bb46..49575a640a84 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h @@ -18,6 +18,8 @@ #define I40E_MAX_VF_PROMISC_FLAGS 3 +#define I40E_VF_STATE_WAIT_COUNT 20 + /* Various queue ctrls */ enum i40e_queue_ctrl { I40E_QUEUE_CTRL_UNKNOWN = 0, -- cgit From 8aa55ab422d9d0d825ebfb877702ed661e96e682 Mon Sep 17 00:00:00 2001 From: Mateusz Palczewski Date: Fri, 16 Jul 2021 11:33:56 +0200 Subject: i40e: Fix pre-set max number of queues for VF After setting pre-set combined to 16 queues and reserving 16 queues by tc qdisc, pre-set maximum combined queues returned to default value after VF reset being 4 and this generated errors during removing tc. Fixed by removing clear num_req_queues before reset VF. Fixes: e284fc280473 (i40e: Add and delete cloud filter) Signed-off-by: Grzegorz Szczurek Signed-off-by: Mateusz Palczewski Tested-by: Bindushree P Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index f651861442c2..2ea4deb8fc44 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -3823,11 +3823,6 @@ static int i40e_vc_add_qch_msg(struct i40e_vf *vf, u8 *msg) /* set this flag only after making sure all inputs are sane */ vf->adq_enabled = true; - /* num_req_queues is set when user changes number of queues via ethtool - * and this causes issue for default VSI(which depends on this variable) - * when ADq is enabled, hence reset it. - */ - vf->num_req_queues = 0; /* reset the VF in order to allocate resources */ i40e_vc_reset_vf(vf, true); -- cgit From 23ec111bf3549aae37140330c31a16abfc172421 Mon Sep 17 00:00:00 2001 From: Norbert Zulinski Date: Mon, 22 Nov 2021 12:29:05 +0100 Subject: i40e: Fix NULL pointer dereference in i40e_dbg_dump_desc When trying to dump VFs VSI RX/TX descriptors using debugfs there was a crash due to NULL pointer dereference in i40e_dbg_dump_desc. Added a check to i40e_dbg_dump_desc that checks if VSI type is correct for dumping RX/TX descriptors. Fixes: 02e9c290814c ("i40e: debugfs interface") Signed-off-by: Sylwester Dziedziuch Signed-off-by: Norbert Zulinski Signed-off-by: Mateusz Palczewski Tested-by: Gurucharan G Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c index 291e61ac3e44..2c1b1da1220e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c @@ -553,6 +553,14 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n, dev_info(&pf->pdev->dev, "vsi %d not found\n", vsi_seid); return; } + if (vsi->type != I40E_VSI_MAIN && + vsi->type != I40E_VSI_FDIR && + vsi->type != I40E_VSI_VMDQ2) { + dev_info(&pf->pdev->dev, + "vsi %d type %d descriptor rings not available\n", + vsi_seid, vsi->type); + return; + } if (type == RING_TYPE_XDP && !i40e_enabled_xdp_vsi(vsi)) { dev_info(&pf->pdev->dev, "XDP not enabled on VSI %d\n", vsi_seid); return; -- cgit From dde91ccfa25fd58f64c397d91b81a4b393100ffa Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Fri, 3 Dec 2021 11:13:18 +0100 Subject: ethtool: do not perform operations on net devices being unregistered There is a short period between a net device starts to be unregistered and when it is actually gone. In that time frame ethtool operations could still be performed, which might end up in unwanted or undefined behaviours[1]. Do not allow ethtool operations after a net device starts its unregistration. This patch targets the netlink part as the ioctl one isn't affected: the reference to the net device is taken and the operation is executed within an rtnl lock section and the net device won't be found after unregister. [1] For example adding Tx queues after unregister ends up in NULL pointer exceptions and UaFs, such as: BUG: KASAN: use-after-free in kobject_get+0x14/0x90 Read of size 1 at addr ffff88801961248c by task ethtool/755 CPU: 0 PID: 755 Comm: ethtool Not tainted 5.15.0-rc6+ #778 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/014 Call Trace: dump_stack_lvl+0x57/0x72 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x7f/0x11b kobject_get+0x14/0x90 kobject_add_internal+0x3d1/0x450 kobject_init_and_add+0xba/0xf0 netdev_queue_update_kobjects+0xcf/0x200 netif_set_real_num_tx_queues+0xb4/0x310 veth_set_channels+0x1c3/0x550 ethnl_set_channels+0x524/0x610 Fixes: 041b1c5d4a53 ("ethtool: helper functions for netlink interface") Suggested-by: Jakub Kicinski Signed-off-by: Antoine Tenart Link: https://lore.kernel.org/r/20211203101318.435618-1-atenart@kernel.org Signed-off-by: Jakub Kicinski --- net/ethtool/netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index 38b44c0291b1..96f4180aabd2 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -40,7 +40,8 @@ int ethnl_ops_begin(struct net_device *dev) if (dev->dev.parent) pm_runtime_get_sync(dev->dev.parent); - if (!netif_device_present(dev)) { + if (!netif_device_present(dev) || + dev->reg_state == NETREG_UNREGISTERING) { ret = -ENODEV; goto err; } -- cgit From 4dbb0dad8e63fcd0b5a117c2861d2abe7ff5f186 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sun, 5 Dec 2021 11:28:22 -0800 Subject: devlink: fix netns refcount leak in devlink_nl_cmd_reload() While preparing my patch series adding netns refcount tracking, I spotted bugs in devlink_nl_cmd_reload() Some error paths forgot to release a refcount on a netns. To fix this, we can reduce the scope of get_net()/put_net() section around the call to devlink_reload(). Fixes: ccdf07219da6 ("devlink: Add reload action option to devlink reload command") Fixes: dc64cc7c6310 ("devlink: Add devlink reload limit option") Signed-off-by: Eric Dumazet Cc: Moshe Shemesh Cc: Jacob Keller Cc: Jiri Pirko Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20211205192822.1741045-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- net/core/devlink.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/core/devlink.c b/net/core/devlink.c index 5ad72dbfcd07..c06c9ba6e8c5 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -4110,14 +4110,6 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) return err; } - if (info->attrs[DEVLINK_ATTR_NETNS_PID] || - info->attrs[DEVLINK_ATTR_NETNS_FD] || - info->attrs[DEVLINK_ATTR_NETNS_ID]) { - dest_net = devlink_netns_get(skb, info); - if (IS_ERR(dest_net)) - return PTR_ERR(dest_net); - } - if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION]) action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]); else @@ -4160,6 +4152,14 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) return -EINVAL; } } + if (info->attrs[DEVLINK_ATTR_NETNS_PID] || + info->attrs[DEVLINK_ATTR_NETNS_FD] || + info->attrs[DEVLINK_ATTR_NETNS_ID]) { + dest_net = devlink_netns_get(skb, info); + if (IS_ERR(dest_net)) + return PTR_ERR(dest_net); + } + err = devlink_reload(devlink, dest_net, action, limit, &actions_performed, info->extack); if (dest_net) -- cgit From 94cddf1e9227a171b27292509d59691819c458db Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Tue, 23 Nov 2021 20:16:54 +0900 Subject: can: pch_can: pch_can_rx_normal: fix use after free After calling netif_receive_skb(skb), dereferencing skb is unsafe. Especially, the can_frame cf which aliases skb memory is dereferenced just after the call netif_receive_skb(skb). Reordering the lines solves the issue. Fixes: b21d18b51b31 ("can: Topcliff: Add PCH_CAN driver.") Link: https://lore.kernel.org/all/20211123111654.621610-1-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Signed-off-by: Vincent Mailhol Signed-off-by: Marc Kleine-Budde --- drivers/net/can/pch_can.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c index 92a54a5fd4c5..964c8a09226a 100644 --- a/drivers/net/can/pch_can.c +++ b/drivers/net/can/pch_can.c @@ -692,11 +692,11 @@ static int pch_can_rx_normal(struct net_device *ndev, u32 obj_num, int quota) cf->data[i + 1] = data_reg >> 8; } - netif_receive_skb(skb); rcv_pkts++; stats->rx_packets++; quota--; stats->rx_bytes += cf->len; + netif_receive_skb(skb); pch_fifo_thresh(priv, obj_num); obj_num++; -- cgit From 3ec6ca6b1a8e64389f0212b5a1b0f6fed1909e45 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 24 Nov 2021 17:50:41 +0300 Subject: can: sja1000: fix use after free in ems_pcmcia_add_card() If the last channel is not available then "dev" is freed. Fortunately, we can just use "pdev->irq" instead. Also we should check if at least one channel was set up. Fixes: fd734c6f25ae ("can/sja1000: add driver for EMS PCMCIA card") Link: https://lore.kernel.org/all/20211124145041.GB13656@kili Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter Acked-by: Oliver Hartkopp Tested-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- drivers/net/can/sja1000/ems_pcmcia.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/sja1000/ems_pcmcia.c b/drivers/net/can/sja1000/ems_pcmcia.c index e21b169c14c0..4642b6d4aaf7 100644 --- a/drivers/net/can/sja1000/ems_pcmcia.c +++ b/drivers/net/can/sja1000/ems_pcmcia.c @@ -234,7 +234,12 @@ static int ems_pcmcia_add_card(struct pcmcia_device *pdev, unsigned long base) free_sja1000dev(dev); } - err = request_irq(dev->irq, &ems_pcmcia_interrupt, IRQF_SHARED, + if (!card->channels) { + err = -ENODEV; + goto failure_cleanup; + } + + err = request_irq(pdev->irq, &ems_pcmcia_interrupt, IRQF_SHARED, DRV_NAME, card); if (!err) return 0; -- cgit From f58ac1adc76b5beda43c64ef359056077df4d93a Mon Sep 17 00:00:00 2001 From: Brian Silverman Date: Mon, 29 Nov 2021 14:26:28 -0800 Subject: can: m_can: Disable and ignore ELO interrupt With the design of this driver, this condition is often triggered. However, the counter that this interrupt indicates an overflow is never read either, so overflowing is harmless. On my system, when a CAN bus starts flapping up and down, this locks up the whole system with lots of interrupts and printks. Specifically, this interrupt indicates the CEL field of ECR has overflowed. All reads of ECR mask out CEL. Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support") Link: https://lore.kernel.org/all/20211129222628.7490-1-brian.silverman@bluerivertech.com Cc: stable@vger.kernel.org Signed-off-by: Brian Silverman Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index 2470c47b2e31..91be87c4f4d3 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -204,16 +204,16 @@ enum m_can_reg { /* Interrupts for version 3.0.x */ #define IR_ERR_LEC_30X (IR_STE | IR_FOE | IR_ACKE | IR_BE | IR_CRCE) -#define IR_ERR_BUS_30X (IR_ERR_LEC_30X | IR_WDI | IR_ELO | IR_BEU | \ - IR_BEC | IR_TOO | IR_MRAF | IR_TSW | IR_TEFL | \ - IR_RF1L | IR_RF0L) +#define IR_ERR_BUS_30X (IR_ERR_LEC_30X | IR_WDI | IR_BEU | IR_BEC | \ + IR_TOO | IR_MRAF | IR_TSW | IR_TEFL | IR_RF1L | \ + IR_RF0L) #define IR_ERR_ALL_30X (IR_ERR_STATE | IR_ERR_BUS_30X) /* Interrupts for version >= 3.1.x */ #define IR_ERR_LEC_31X (IR_PED | IR_PEA) -#define IR_ERR_BUS_31X (IR_ERR_LEC_31X | IR_WDI | IR_ELO | IR_BEU | \ - IR_BEC | IR_TOO | IR_MRAF | IR_TSW | IR_TEFL | \ - IR_RF1L | IR_RF0L) +#define IR_ERR_BUS_31X (IR_ERR_LEC_31X | IR_WDI | IR_BEU | IR_BEC | \ + IR_TOO | IR_MRAF | IR_TSW | IR_TEFL | IR_RF1L | \ + IR_RF0L) #define IR_ERR_ALL_31X (IR_ERR_STATE | IR_ERR_BUS_31X) /* Interrupt Line Select (ILS) */ @@ -810,8 +810,6 @@ static void m_can_handle_other_err(struct net_device *dev, u32 irqstatus) { if (irqstatus & IR_WDI) netdev_err(dev, "Message RAM Watchdog event due to missing READY\n"); - if (irqstatus & IR_ELO) - netdev_err(dev, "Error Logging Overflow\n"); if (irqstatus & IR_BEU) netdev_err(dev, "Bit Error Uncorrected\n"); if (irqstatus & IR_BEC) -- cgit From 31cb32a590d62b18f69a9a6d433f4e69c74fdd56 Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Sun, 7 Nov 2021 14:07:55 +0900 Subject: can: m_can: m_can_read_fifo: fix memory leak in error branch In m_can_read_fifo(), if the second call to m_can_fifo_read() fails, the function jump to the out_fail label and returns without calling m_can_receive_skb(). This means that the skb previously allocated by alloc_can_skb() is not freed. In other terms, this is a memory leak. This patch adds a goto label to destroy the skb if an error occurs. Issue was found with GCC -fanalyzer, please follow the link below for details. Fixes: e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors") Link: https://lore.kernel.org/all/20211107050755.70655-1-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Cc: Matt Kline Signed-off-by: Vincent Mailhol Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index 91be87c4f4d3..e330b4c121bf 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -517,7 +517,7 @@ static int m_can_read_fifo(struct net_device *dev, u32 rxfs) err = m_can_fifo_read(cdev, fgi, M_CAN_FIFO_DATA, cf->data, DIV_ROUND_UP(cf->len, 4)); if (err) - goto out_fail; + goto out_free_skb; } /* acknowledge rx fifo 0 */ @@ -532,6 +532,8 @@ static int m_can_read_fifo(struct net_device *dev, u32 rxfs) return 0; +out_free_skb: + kfree_skb(skb); out_fail: netdev_err(dev, "FIFO read returned %d\n", err); return err; -- cgit From d737de2d7cc3efdacbf17d4e22efc75697bd76d9 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Thu, 18 Nov 2021 15:40:11 +0100 Subject: can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo() The same fix that was previously done in m_can_platform in commit 99d173fbe894 ("can: m_can: fix iomap_read_fifo() and iomap_write_fifo()") is required in m_can_pci as well to make iomap_read_fifo() and iomap_write_fifo() work for val_count > 1. Fixes: 812270e5445b ("can: m_can: Batch FIFO writes during CAN transmit") Fixes: 1aa6772f64b4 ("can: m_can: Batch FIFO reads during CAN receive") Link: https://lore.kernel.org/all/20211118144011.10921-1-matthias.schiffer@ew.tq-group.com Cc: stable@vger.kernel.org Cc: Matt Kline Signed-off-by: Matthias Schiffer Tested-by: Jarkko Nikula Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can_pci.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c index 89cc3d41e952..d72c294ac4d3 100644 --- a/drivers/net/can/m_can/m_can_pci.c +++ b/drivers/net/can/m_can/m_can_pci.c @@ -42,8 +42,13 @@ static u32 iomap_read_reg(struct m_can_classdev *cdev, int reg) static int iomap_read_fifo(struct m_can_classdev *cdev, int offset, void *val, size_t val_count) { struct m_can_pci_priv *priv = cdev_to_priv(cdev); + void __iomem *src = priv->base + offset; - ioread32_rep(priv->base + offset, val, val_count); + while (val_count--) { + *(unsigned int *)val = ioread32(src); + val += 4; + src += 4; + } return 0; } @@ -61,8 +66,13 @@ static int iomap_write_fifo(struct m_can_classdev *cdev, int offset, const void *val, size_t val_count) { struct m_can_pci_priv *priv = cdev_to_priv(cdev); + void __iomem *dst = priv->base + offset; - iowrite32_rep(priv->base + offset, val, val_count); + while (val_count--) { + iowrite32(*(unsigned int *)val, dst); + val += 4; + dst += 4; + } return 0; } -- cgit From 8c03b8bff765ac4146342ef90931bb50e788c758 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Mon, 15 Nov 2021 10:18:49 +0100 Subject: can: m_can: pci: fix incorrect reference clock rate When testing the CAN controller on our Ekhart Lake hardware, we determined that all communication was running with twice the configured bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed this. Intel's support has confirmed to us that 200MHz is indeed the correct clock rate. Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") Link: https://lore.kernel.org/all/c9cf3995f45c363e432b3ae8eb1275e54f009fc8.1636967198.git.matthias.schiffer@ew.tq-group.com Cc: stable@vger.kernel.org Signed-off-by: Matthias Schiffer Acked-by: Jarkko Nikula Reviewed-by: Jarkko Nikula Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c index d72c294ac4d3..8f184a852a0a 100644 --- a/drivers/net/can/m_can/m_can_pci.c +++ b/drivers/net/can/m_can/m_can_pci.c @@ -18,7 +18,7 @@ #define M_CAN_PCI_MMIO_BAR 0 -#define M_CAN_CLOCK_FREQ_EHL 100000000 +#define M_CAN_CLOCK_FREQ_EHL 200000000 #define CTL_CSR_INT_CTL_OFFSET 0x508 struct m_can_pci_priv { -- cgit From ea768b2ffec6cc9c3e17c37ef75d0539b8f89ff5 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Mon, 15 Nov 2021 10:18:50 +0100 Subject: Revert "can: m_can: remove support for custom bit timing" The timing limits specified by the Elkhart Lake CPU datasheets do not match the defaults. Let's reintroduce the support for custom bit timings. This reverts commit 0ddd83fbebbc5537f9d180d31f659db3564be708. Link: https://lore.kernel.org/all/00c9e2596b1a548906921a574d4ef7a03c0dace0.1636967198.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Matthias Schiffer Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can.c | 24 ++++++++++++++++++------ drivers/net/can/m_can/m_can.h | 3 +++ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index e330b4c121bf..c2a8421e7845 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -1494,20 +1494,32 @@ static int m_can_dev_setup(struct m_can_classdev *cdev) case 30: /* CAN_CTRLMODE_FD_NON_ISO is fixed with M_CAN IP v3.0.x */ can_set_static_ctrlmode(dev, CAN_CTRLMODE_FD_NON_ISO); - cdev->can.bittiming_const = &m_can_bittiming_const_30X; - cdev->can.data_bittiming_const = &m_can_data_bittiming_const_30X; + cdev->can.bittiming_const = cdev->bit_timing ? + cdev->bit_timing : &m_can_bittiming_const_30X; + + cdev->can.data_bittiming_const = cdev->data_timing ? + cdev->data_timing : + &m_can_data_bittiming_const_30X; break; case 31: /* CAN_CTRLMODE_FD_NON_ISO is fixed with M_CAN IP v3.1.x */ can_set_static_ctrlmode(dev, CAN_CTRLMODE_FD_NON_ISO); - cdev->can.bittiming_const = &m_can_bittiming_const_31X; - cdev->can.data_bittiming_const = &m_can_data_bittiming_const_31X; + cdev->can.bittiming_const = cdev->bit_timing ? + cdev->bit_timing : &m_can_bittiming_const_31X; + + cdev->can.data_bittiming_const = cdev->data_timing ? + cdev->data_timing : + &m_can_data_bittiming_const_31X; break; case 32: case 33: /* Support both MCAN version v3.2.x and v3.3.0 */ - cdev->can.bittiming_const = &m_can_bittiming_const_31X; - cdev->can.data_bittiming_const = &m_can_data_bittiming_const_31X; + cdev->can.bittiming_const = cdev->bit_timing ? + cdev->bit_timing : &m_can_bittiming_const_31X; + + cdev->can.data_bittiming_const = cdev->data_timing ? + cdev->data_timing : + &m_can_data_bittiming_const_31X; cdev->can.ctrlmode_supported |= (m_can_niso_supported(cdev) ? diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h index d18b515e6ccc..ad063b101411 100644 --- a/drivers/net/can/m_can/m_can.h +++ b/drivers/net/can/m_can/m_can.h @@ -85,6 +85,9 @@ struct m_can_classdev { struct sk_buff *tx_skb; struct phy *transceiver; + struct can_bittiming_const *bit_timing; + struct can_bittiming_const *data_timing; + struct m_can_ops *ops; int version; -- cgit From ea22ba40debee29ee7257c42002409899e9311c1 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Mon, 15 Nov 2021 10:18:51 +0100 Subject: can: m_can: make custom bittiming fields const The assigned timing structs will be defined a const anyway, so we can avoid a few casts by declaring the struct fields as const as well. Link: https://lore.kernel.org/all/4508fa4e639164b2584c49a065d90c78a91fa568.1636967198.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Matthias Schiffer Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h index ad063b101411..2c5d40997168 100644 --- a/drivers/net/can/m_can/m_can.h +++ b/drivers/net/can/m_can/m_can.h @@ -85,8 +85,8 @@ struct m_can_classdev { struct sk_buff *tx_skb; struct phy *transceiver; - struct can_bittiming_const *bit_timing; - struct can_bittiming_const *data_timing; + const struct can_bittiming_const *bit_timing; + const struct can_bittiming_const *data_timing; struct m_can_ops *ops; -- cgit From ea4c1787685dbf9842046f05b6390b6901ee6ba2 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Mon, 15 Nov 2021 10:18:52 +0100 Subject: can: m_can: pci: use custom bit timings for Elkhart Lake MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The relevant datasheet [1] specifies nonstandard limits for the bit timing parameters. While it is unclear what the exact effect of violating these limits is, it seems like a good idea to adhere to the documentation. [1] Intel Atom® x6000E Series, and Intel® Pentium® and Celeron® N and J Series Processors for IoT Applications Datasheet, Volume 2 (Book 3 of 3), July 2021, Revision 001 Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") Link: https://lore.kernel.org/all/9eba5d7c05a48ead4024ffa6e5926f191d8c6b38.1636967198.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Matthias Schiffer Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can_pci.c | 48 +++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c index 8f184a852a0a..b56a54d6c5a9 100644 --- a/drivers/net/can/m_can/m_can_pci.c +++ b/drivers/net/can/m_can/m_can_pci.c @@ -18,9 +18,14 @@ #define M_CAN_PCI_MMIO_BAR 0 -#define M_CAN_CLOCK_FREQ_EHL 200000000 #define CTL_CSR_INT_CTL_OFFSET 0x508 +struct m_can_pci_config { + const struct can_bittiming_const *bit_timing; + const struct can_bittiming_const *data_timing; + unsigned int clock_freq; +}; + struct m_can_pci_priv { struct m_can_classdev cdev; @@ -84,9 +89,40 @@ static struct m_can_ops m_can_pci_ops = { .read_fifo = iomap_read_fifo, }; +static const struct can_bittiming_const m_can_bittiming_const_ehl = { + .name = KBUILD_MODNAME, + .tseg1_min = 2, /* Time segment 1 = prop_seg + phase_seg1 */ + .tseg1_max = 64, + .tseg2_min = 1, /* Time segment 2 = phase_seg2 */ + .tseg2_max = 128, + .sjw_max = 128, + .brp_min = 1, + .brp_max = 512, + .brp_inc = 1, +}; + +static const struct can_bittiming_const m_can_data_bittiming_const_ehl = { + .name = KBUILD_MODNAME, + .tseg1_min = 2, /* Time segment 1 = prop_seg + phase_seg1 */ + .tseg1_max = 16, + .tseg2_min = 1, /* Time segment 2 = phase_seg2 */ + .tseg2_max = 8, + .sjw_max = 4, + .brp_min = 1, + .brp_max = 32, + .brp_inc = 1, +}; + +static const struct m_can_pci_config m_can_pci_ehl = { + .bit_timing = &m_can_bittiming_const_ehl, + .data_timing = &m_can_data_bittiming_const_ehl, + .clock_freq = 200000000, +}; + static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) { struct device *dev = &pci->dev; + const struct m_can_pci_config *cfg; struct m_can_classdev *mcan_class; struct m_can_pci_priv *priv; void __iomem *base; @@ -114,6 +150,8 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) if (!mcan_class) return -ENOMEM; + cfg = (const struct m_can_pci_config *)id->driver_data; + priv = cdev_to_priv(mcan_class); priv->base = base; @@ -125,7 +163,9 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) mcan_class->dev = &pci->dev; mcan_class->net->irq = pci_irq_vector(pci, 0); mcan_class->pm_clock_support = 1; - mcan_class->can.clock.freq = id->driver_data; + mcan_class->bit_timing = cfg->bit_timing; + mcan_class->data_timing = cfg->data_timing; + mcan_class->can.clock.freq = cfg->clock_freq; mcan_class->ops = &m_can_pci_ops; pci_set_drvdata(pci, mcan_class); @@ -178,8 +218,8 @@ static SIMPLE_DEV_PM_OPS(m_can_pci_pm_ops, m_can_pci_suspend, m_can_pci_resume); static const struct pci_device_id m_can_pci_id_table[] = { - { PCI_VDEVICE(INTEL, 0x4bc1), M_CAN_CLOCK_FREQ_EHL, }, - { PCI_VDEVICE(INTEL, 0x4bc2), M_CAN_CLOCK_FREQ_EHL, }, + { PCI_VDEVICE(INTEL, 0x4bc1), (kernel_ulong_t)&m_can_pci_ehl, }, + { PCI_VDEVICE(INTEL, 0x4bc2), (kernel_ulong_t)&m_can_pci_ehl, }, { } /* Terminating Entry */ }; MODULE_DEVICE_TABLE(pci, m_can_pci_id_table); -- cgit From d17b9737c2bc09b4ac6caf469826e5a7ce3ffab7 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 7 Dec 2021 11:24:16 +0300 Subject: net/qla3xxx: fix an error code in ql_adapter_up() The ql_wait_for_drvr_lock() fails and returns false, then this function should return an error code instead of returning success. The other problem is that the success path prints an error message netdev_err(ndev, "Releasing driver lock\n"); Delete that and re-order the code a little to make it more clear. Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20211207082416.GA16110@kili Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/qlogic/qla3xxx.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index 1e6d72adfe43..71523d747e93 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -3480,20 +3480,19 @@ static int ql_adapter_up(struct ql3_adapter *qdev) spin_lock_irqsave(&qdev->hw_lock, hw_flags); - err = ql_wait_for_drvr_lock(qdev); - if (err) { - err = ql_adapter_initialize(qdev); - if (err) { - netdev_err(ndev, "Unable to initialize adapter\n"); - goto err_init; - } - netdev_err(ndev, "Releasing driver lock\n"); - ql_sem_unlock(qdev, QL_DRVR_SEM_MASK); - } else { + if (!ql_wait_for_drvr_lock(qdev)) { netdev_err(ndev, "Could not acquire driver lock\n"); + err = -ENODEV; goto err_lock; } + err = ql_adapter_initialize(qdev); + if (err) { + netdev_err(ndev, "Unable to initialize adapter\n"); + goto err_init; + } + ql_sem_unlock(qdev, QL_DRVR_SEM_MASK); + spin_unlock_irqrestore(&qdev->hw_lock, hw_flags); set_bit(QL_ADAPTER_UP, &qdev->flags); -- cgit From f23ab04dd6f703e282bb2d51fe3ae14f4b88a628 Mon Sep 17 00:00:00 2001 From: Yahui Cao Date: Wed, 5 May 2021 14:18:00 -0700 Subject: ice: fix FDIR init missing when reset VF When VF is being reset, ice_reset_vf() will be called and FDIR resource should be released and initialized again. Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Yahui Cao Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 217ff5e9a6f1..c2431bc9d9ce 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -1617,6 +1617,7 @@ bool ice_reset_all_vfs(struct ice_pf *pf, bool is_vflr) ice_vc_set_default_allowlist(vf); ice_vf_fdir_exit(vf); + ice_vf_fdir_init(vf); /* clean VF control VSI when resetting VFs since it should be * setup only when VF creates its first FDIR rule. */ @@ -1747,6 +1748,7 @@ bool ice_reset_vf(struct ice_vf *vf, bool is_vflr) } ice_vf_fdir_exit(vf); + ice_vf_fdir_init(vf); /* clean VF control VSI when resetting VF since it should be setup * only when VF creates its first FDIR rule. */ -- cgit From 2657e16d8c52fb6ffc7250b0b6536f93886e32d6 Mon Sep 17 00:00:00 2001 From: Paul Greenwalt Date: Mon, 12 Jul 2021 07:54:25 -0400 Subject: ice: rearm other interrupt cause register after enabling VFs The other interrupt cause register (OICR), global interrupt 0, is disabled when enabling VFs to prevent handling VFLR. If the OICR is not rearmed then the VF cannot communicate with the PF. Rearm the OICR after enabling VFs. Fixes: 916c7fdf5e93 ("ice: Separate VF VSI initialization/creation from reset flow") Signed-off-by: Paul Greenwalt Tested-by: Tony Brelinski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index c2431bc9d9ce..6427e7ec93de 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -2023,6 +2023,10 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs) if (ret) goto err_unroll_sriov; + /* rearm global interrupts */ + if (test_and_clear_bit(ICE_OICR_INTR_DIS, pf->state)) + ice_irq_dynamic_ena(hw, NULL, NULL); + return 0; err_unroll_sriov: -- cgit From 6d39ea19b0fb6cc72427c862b32d39f5af468be3 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Tue, 12 Oct 2021 13:31:21 -0700 Subject: ice: Fix problems with DSCP QoS implementation The patch that implemented DSCP QoS implementation removed a bandwidth check that was used to check for a specific condition caused by some corner cases. This check should not of been removed. The same patch also added a check for when the DCBx state could be changed in relation to DSCP, but the check was erroneously added nested in a check for CEE mode, which made the check useless. Fix these problems by re-adding the bandwidth check and relocating the DSCP mode check earlier in the function that changes DCBx state in the driver. Fixes: 2a87bd73e50d ("ice: Add DSCP support") Reported-by: kernel test robot Signed-off-by: Dave Ertman Tested-by: Gurucharan G Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_dcb_nl.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_nl.c b/drivers/net/ethernet/intel/ice/ice_dcb_nl.c index 7fdeb411b6df..3eb01731e496 100644 --- a/drivers/net/ethernet/intel/ice/ice_dcb_nl.c +++ b/drivers/net/ethernet/intel/ice/ice_dcb_nl.c @@ -97,6 +97,9 @@ static int ice_dcbnl_setets(struct net_device *netdev, struct ieee_ets *ets) new_cfg->etscfg.maxtcs = pf->hw.func_caps.common_cap.maxtc; + if (!bwcfg) + new_cfg->etscfg.tcbwtable[0] = 100; + if (!bwrec) new_cfg->etsrec.tcbwtable[0] = 100; @@ -167,15 +170,18 @@ static u8 ice_dcbnl_setdcbx(struct net_device *netdev, u8 mode) if (mode == pf->dcbx_cap) return ICE_DCB_NO_HW_CHG; - pf->dcbx_cap = mode; qos_cfg = &pf->hw.port_info->qos_cfg; - if (mode & DCB_CAP_DCBX_VER_CEE) { - if (qos_cfg->local_dcbx_cfg.pfc_mode == ICE_QOS_MODE_DSCP) - return ICE_DCB_NO_HW_CHG; + + /* DSCP configuration is not DCBx negotiated */ + if (qos_cfg->local_dcbx_cfg.pfc_mode == ICE_QOS_MODE_DSCP) + return ICE_DCB_NO_HW_CHG; + + pf->dcbx_cap = mode; + + if (mode & DCB_CAP_DCBX_VER_CEE) qos_cfg->local_dcbx_cfg.dcbx_mode = ICE_DCBX_MODE_CEE; - } else { + else qos_cfg->local_dcbx_cfg.dcbx_mode = ICE_DCBX_MODE_IEEE; - } dev_info(ice_pf_to_dev(pf), "DCBx mode = 0x%x\n", mode); return ICE_DCB_HW_CHG_RST; -- cgit From 28dc1b86f8ea9fd6f4c9e0b363db73ecabf84e22 Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Fri, 22 Oct 2021 17:28:17 -0700 Subject: ice: ignore dropped packets during init If the hardware is constantly receiving unicast or broadcast packets during driver load, the device previously counted many GLV_RDPC (VSI dropped packets) events during init. This causes confusing dropped packet statistics during driver load. The dropped packets counter incrementing does stop once the driver finishes loading. Avoid this problem by baselining our statistics at the end of driver open instead of the end of probe. Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx") Signed-off-by: Jesse Brandeburg Tested-by: Gurucharan G Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 4d1fc48c9744..c6d6ce52e2ca 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5881,6 +5881,9 @@ static int ice_up_complete(struct ice_vsi *vsi) netif_carrier_on(vsi->netdev); } + /* clear this now, and the first stats read will be used as baseline */ + vsi->stat_offsets_loaded = false; + ice_service_task_schedule(pf); return 0; -- cgit From 0e32ff024035b693a3304b3ffe30fba58e2ab48c Mon Sep 17 00:00:00 2001 From: Michal Swiatkowski Date: Tue, 16 Nov 2021 11:24:26 +0100 Subject: ice: fix choosing UDP header type In tunnels packet there can be two UDP headers: - outer which for hw should be mark as ICE_UDP_OF - inner which for hw should be mark as ICE_UDP_ILOS or as ICE_TCP_IL if inner header is of TCP type In none tunnels packet header can be: - UDP, which for hw should be mark as ICE_UDP_ILOS - TCP, which for hw should be mark as ICE_TCP_IL Change incorrect ICE_UDP_OF for none tunnel packets to ICE_UDP_ILOS. ICE_UDP_OF is incorrect for none tunnel packets and setting it leads to error from hw while adding this kind of recipe. In summary, for tunnel outer port type should always be set to ICE_UDP_OF, for none tunnel outer and tunnel inner it should always be set to ICE_UDP_ILOS. Fixes: 9e300987d4a8 ("ice: VXLAN and Geneve TC support") Signed-off-by: Michal Swiatkowski Tested-by: Sandeep Penigalapati Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_tc_lib.c | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_tc_lib.c b/drivers/net/ethernet/intel/ice/ice_tc_lib.c index e5d23feb6701..384439a267ad 100644 --- a/drivers/net/ethernet/intel/ice/ice_tc_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_tc_lib.c @@ -74,21 +74,13 @@ static enum ice_protocol_type ice_proto_type_from_ipv6(bool inner) return inner ? ICE_IPV6_IL : ICE_IPV6_OFOS; } -static enum ice_protocol_type -ice_proto_type_from_l4_port(bool inner, u16 ip_proto) +static enum ice_protocol_type ice_proto_type_from_l4_port(u16 ip_proto) { - if (inner) { - switch (ip_proto) { - case IPPROTO_UDP: - return ICE_UDP_ILOS; - } - } else { - switch (ip_proto) { - case IPPROTO_TCP: - return ICE_TCP_IL; - case IPPROTO_UDP: - return ICE_UDP_OF; - } + switch (ip_proto) { + case IPPROTO_TCP: + return ICE_TCP_IL; + case IPPROTO_UDP: + return ICE_UDP_ILOS; } return 0; @@ -191,8 +183,9 @@ ice_tc_fill_tunnel_outer(u32 flags, struct ice_tc_flower_fltr *fltr, i++; } - if (flags & ICE_TC_FLWR_FIELD_ENC_DEST_L4_PORT) { - list[i].type = ice_proto_type_from_l4_port(false, hdr->l3_key.ip_proto); + if ((flags & ICE_TC_FLWR_FIELD_ENC_DEST_L4_PORT) && + hdr->l3_key.ip_proto == IPPROTO_UDP) { + list[i].type = ICE_UDP_OF; list[i].h_u.l4_hdr.dst_port = hdr->l4_key.dst_port; list[i].m_u.l4_hdr.dst_port = hdr->l4_mask.dst_port; i++; @@ -317,7 +310,7 @@ ice_tc_fill_rules(struct ice_hw *hw, u32 flags, ICE_TC_FLWR_FIELD_SRC_L4_PORT)) { struct ice_tc_l4_hdr *l4_key, *l4_mask; - list[i].type = ice_proto_type_from_l4_port(inner, headers->l3_key.ip_proto); + list[i].type = ice_proto_type_from_l4_port(headers->l3_key.ip_proto); l4_key = &headers->l4_key; l4_mask = &headers->l4_mask; -- cgit From de6acd1cdd4d38823b7f4adae82e8a7d62993354 Mon Sep 17 00:00:00 2001 From: Michal Swiatkowski Date: Mon, 22 Nov 2021 16:39:25 +0100 Subject: ice: fix adding different tunnels Adding filters with the same values inside for VXLAN and Geneve causes HW error, because it looks exactly the same. To choose between different type of tunnels new recipe is needed. Add storing tunnel types in creating recipes function and start checking it in finding function. Change getting open tunnels function to return port on correct tunnel type. This is needed to copy correct port to dummy packet. Block user from adding enc_dst_port via tc flower, because VXLAN and Geneve filters can be created only with destination port which was previously opened. Fixes: 8b032a55c1bd5 ("ice: low level support for tunnels") Signed-off-by: Michal Swiatkowski Tested-by: Sandeep Penigalapati Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c | 4 ++-- drivers/net/ethernet/intel/ice/ice_fdir.c | 2 +- drivers/net/ethernet/intel/ice/ice_flex_pipe.c | 7 +++++-- drivers/net/ethernet/intel/ice/ice_flex_pipe.h | 3 ++- drivers/net/ethernet/intel/ice/ice_switch.c | 19 +++++++++++++------ drivers/net/ethernet/intel/ice/ice_tc_lib.c | 3 ++- 6 files changed, 25 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c b/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c index 38960bcc384c..b6e7f47c8c78 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c @@ -1268,7 +1268,7 @@ ice_fdir_write_all_fltr(struct ice_pf *pf, struct ice_fdir_fltr *input, bool is_tun = tun == ICE_FD_HW_SEG_TUN; int err; - if (is_tun && !ice_get_open_tunnel_port(&pf->hw, &port_num)) + if (is_tun && !ice_get_open_tunnel_port(&pf->hw, &port_num, TNL_ALL)) continue; err = ice_fdir_write_fltr(pf, input, add, is_tun); if (err) @@ -1652,7 +1652,7 @@ int ice_add_fdir_ethtool(struct ice_vsi *vsi, struct ethtool_rxnfc *cmd) } /* return error if not an update and no available filters */ - fltrs_needed = ice_get_open_tunnel_port(hw, &tunnel_port) ? 2 : 1; + fltrs_needed = ice_get_open_tunnel_port(hw, &tunnel_port, TNL_ALL) ? 2 : 1; if (!ice_fdir_find_fltr_by_idx(hw, fsp->location) && ice_fdir_num_avail_fltr(hw, pf->vsi[vsi->idx]) < fltrs_needed) { dev_err(dev, "Failed to add filter. The maximum number of flow director filters has been reached.\n"); diff --git a/drivers/net/ethernet/intel/ice/ice_fdir.c b/drivers/net/ethernet/intel/ice/ice_fdir.c index cbd8424631e3..4dca009bdd50 100644 --- a/drivers/net/ethernet/intel/ice/ice_fdir.c +++ b/drivers/net/ethernet/intel/ice/ice_fdir.c @@ -924,7 +924,7 @@ ice_fdir_get_gen_prgm_pkt(struct ice_hw *hw, struct ice_fdir_fltr *input, memcpy(pkt, ice_fdir_pkt[idx].pkt, ice_fdir_pkt[idx].pkt_len); loc = pkt; } else { - if (!ice_get_open_tunnel_port(hw, &tnl_port)) + if (!ice_get_open_tunnel_port(hw, &tnl_port, TNL_ALL)) return ICE_ERR_DOES_NOT_EXIST; if (!ice_fdir_pkt[idx].tun_pkt) return ICE_ERR_PARAM; diff --git a/drivers/net/ethernet/intel/ice/ice_flex_pipe.c b/drivers/net/ethernet/intel/ice/ice_flex_pipe.c index 23cfcceb1536..6ad1c2559724 100644 --- a/drivers/net/ethernet/intel/ice/ice_flex_pipe.c +++ b/drivers/net/ethernet/intel/ice/ice_flex_pipe.c @@ -1899,9 +1899,11 @@ static struct ice_buf *ice_pkg_buf(struct ice_buf_build *bld) * ice_get_open_tunnel_port - retrieve an open tunnel port * @hw: pointer to the HW structure * @port: returns open port + * @type: type of tunnel, can be TNL_LAST if it doesn't matter */ bool -ice_get_open_tunnel_port(struct ice_hw *hw, u16 *port) +ice_get_open_tunnel_port(struct ice_hw *hw, u16 *port, + enum ice_tunnel_type type) { bool res = false; u16 i; @@ -1909,7 +1911,8 @@ ice_get_open_tunnel_port(struct ice_hw *hw, u16 *port) mutex_lock(&hw->tnl_lock); for (i = 0; i < hw->tnl.count && i < ICE_TUNNEL_MAX_ENTRIES; i++) - if (hw->tnl.tbl[i].valid && hw->tnl.tbl[i].port) { + if (hw->tnl.tbl[i].valid && hw->tnl.tbl[i].port && + (type == TNL_LAST || type == hw->tnl.tbl[i].type)) { *port = hw->tnl.tbl[i].port; res = true; break; diff --git a/drivers/net/ethernet/intel/ice/ice_flex_pipe.h b/drivers/net/ethernet/intel/ice/ice_flex_pipe.h index 344c2637facd..a2863f38fd1f 100644 --- a/drivers/net/ethernet/intel/ice/ice_flex_pipe.h +++ b/drivers/net/ethernet/intel/ice/ice_flex_pipe.h @@ -33,7 +33,8 @@ enum ice_status ice_get_sw_fv_list(struct ice_hw *hw, u8 *prot_ids, u16 ids_cnt, unsigned long *bm, struct list_head *fv_list); bool -ice_get_open_tunnel_port(struct ice_hw *hw, u16 *port); +ice_get_open_tunnel_port(struct ice_hw *hw, u16 *port, + enum ice_tunnel_type type); int ice_udp_tunnel_set_port(struct net_device *netdev, unsigned int table, unsigned int idx, struct udp_tunnel_info *ti); int ice_udp_tunnel_unset_port(struct net_device *netdev, unsigned int table, diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c index 793f4a9fc2cd..183d93033890 100644 --- a/drivers/net/ethernet/intel/ice/ice_switch.c +++ b/drivers/net/ethernet/intel/ice/ice_switch.c @@ -3796,10 +3796,13 @@ static struct ice_protocol_entry ice_prot_id_tbl[ICE_PROTOCOL_LAST] = { * ice_find_recp - find a recipe * @hw: pointer to the hardware structure * @lkup_exts: extension sequence to match + * @tun_type: type of recipe tunnel * * Returns index of matching recipe, or ICE_MAX_NUM_RECIPES if not found. */ -static u16 ice_find_recp(struct ice_hw *hw, struct ice_prot_lkup_ext *lkup_exts) +static u16 +ice_find_recp(struct ice_hw *hw, struct ice_prot_lkup_ext *lkup_exts, + enum ice_sw_tunnel_type tun_type) { bool refresh_required = true; struct ice_sw_recipe *recp; @@ -3860,8 +3863,9 @@ static u16 ice_find_recp(struct ice_hw *hw, struct ice_prot_lkup_ext *lkup_exts) } /* If for "i"th recipe the found was never set to false * then it means we found our match + * Also tun type of recipe needs to be checked */ - if (found) + if (found && recp[i].tun_type == tun_type) return i; /* Return the recipe ID */ } } @@ -4651,11 +4655,12 @@ ice_add_adv_recipe(struct ice_hw *hw, struct ice_adv_lkup_elem *lkups, } /* Look for a recipe which matches our requested fv / mask list */ - *rid = ice_find_recp(hw, lkup_exts); + *rid = ice_find_recp(hw, lkup_exts, rinfo->tun_type); if (*rid < ICE_MAX_NUM_RECIPES) /* Success if found a recipe that match the existing criteria */ goto err_unroll; + rm->tun_type = rinfo->tun_type; /* Recipe we need does not exist, add a recipe */ status = ice_add_sw_recipe(hw, rm, profiles); if (status) @@ -4958,11 +4963,13 @@ ice_fill_adv_packet_tun(struct ice_hw *hw, enum ice_sw_tunnel_type tun_type, switch (tun_type) { case ICE_SW_TUN_VXLAN: + if (!ice_get_open_tunnel_port(hw, &open_port, TNL_VXLAN)) + return ICE_ERR_CFG; + break; case ICE_SW_TUN_GENEVE: - if (!ice_get_open_tunnel_port(hw, &open_port)) + if (!ice_get_open_tunnel_port(hw, &open_port, TNL_GENEVE)) return ICE_ERR_CFG; break; - default: /* Nothing needs to be done for this tunnel type */ return 0; @@ -5555,7 +5562,7 @@ ice_rem_adv_rule(struct ice_hw *hw, struct ice_adv_lkup_elem *lkups, if (status) return status; - rid = ice_find_recp(hw, &lkup_exts); + rid = ice_find_recp(hw, &lkup_exts, rinfo->tun_type); /* If did not find a recipe that match the existing criteria */ if (rid == ICE_MAX_NUM_RECIPES) return ICE_ERR_PARAM; diff --git a/drivers/net/ethernet/intel/ice/ice_tc_lib.c b/drivers/net/ethernet/intel/ice/ice_tc_lib.c index 384439a267ad..25cca5c4ae57 100644 --- a/drivers/net/ethernet/intel/ice/ice_tc_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_tc_lib.c @@ -795,7 +795,8 @@ ice_parse_tunnel_attr(struct net_device *dev, struct flow_rule *rule, headers->l3_mask.ttl = match.mask->ttl; } - if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_PORTS)) { + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_PORTS) && + fltr->tunnel_type != TNL_VXLAN && fltr->tunnel_type != TNL_GENEVE) { struct flow_match_ports match; flow_rule_match_enc_ports(rule, &match); -- cgit From d43b75fbc23f0ac1ef9c14a5a166d3ccb761a451 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Fri, 26 Nov 2021 15:36:12 +0100 Subject: vrf: don't run conntrack on vrf with !dflt qdisc After the below patch, the conntrack attached to skb is set to "notrack" in the context of vrf device, for locally generated packets. But this is true only when the default qdisc is set to the vrf device. When changing the qdisc, notrack is not set anymore. In fact, there is a shortcut in the vrf driver, when the default qdisc is set, see commit dcdd43c41e60 ("net: vrf: performance improvements for IPv4") for more details. This patch ensures that the behavior is always the same, whatever the qdisc is. To demonstrate the difference, a new test is added in conntrack_vrf.sh. Fixes: 8c9c296adfae ("vrf: run conntrack only in context of lower/physdev for locally generated packets") Signed-off-by: Nicolas Dichtel Acked-by: Florian Westphal Reviewed-by: David Ahern Signed-off-by: Pablo Neira Ayuso --- drivers/net/vrf.c | 8 +++--- tools/testing/selftests/netfilter/conntrack_vrf.sh | 30 +++++++++++++++++++--- 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index ccf677015d5b..38c2f0dbe795 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -768,8 +768,6 @@ static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev, skb->dev = vrf_dev; - vrf_nf_set_untracked(skb); - err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb, NULL, vrf_dev, vrf_ip6_out_direct_finish); @@ -790,6 +788,8 @@ static struct sk_buff *vrf_ip6_out(struct net_device *vrf_dev, if (rt6_need_strict(&ipv6_hdr(skb)->daddr)) return skb; + vrf_nf_set_untracked(skb); + if (qdisc_tx_is_default(vrf_dev) || IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) return vrf_ip6_out_direct(vrf_dev, sk, skb); @@ -998,8 +998,6 @@ static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev, skb->dev = vrf_dev; - vrf_nf_set_untracked(skb); - err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk, skb, NULL, vrf_dev, vrf_ip_out_direct_finish); @@ -1021,6 +1019,8 @@ static struct sk_buff *vrf_ip_out(struct net_device *vrf_dev, ipv4_is_lbcast(ip_hdr(skb)->daddr)) return skb; + vrf_nf_set_untracked(skb); + if (qdisc_tx_is_default(vrf_dev) || IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) return vrf_ip_out_direct(vrf_dev, sk, skb); diff --git a/tools/testing/selftests/netfilter/conntrack_vrf.sh b/tools/testing/selftests/netfilter/conntrack_vrf.sh index 91f3ef0f1192..8b5ea9234588 100755 --- a/tools/testing/selftests/netfilter/conntrack_vrf.sh +++ b/tools/testing/selftests/netfilter/conntrack_vrf.sh @@ -150,11 +150,27 @@ EOF # oifname is the vrf device. test_masquerade_vrf() { + local qdisc=$1 + + if [ "$qdisc" != "default" ]; then + tc -net $ns0 qdisc add dev tvrf root $qdisc + fi + ip netns exec $ns0 conntrack -F 2>/dev/null ip netns exec $ns0 nft -f - < Date: Sat, 27 Nov 2021 11:33:37 +0100 Subject: nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups The sixth byte of packet data has to be looked up in the sixth group, not in the seventh one, even if we load the bucket data into ymm6 (and not ymm5, for convenience of tracking stalls). Without this fix, matching on a MAC address as first field of a set, if 8-bit groups are selected (due to a small set size) would fail, that is, the given MAC address would never match. Reported-by: Nikita Yushchenko Cc: # 5.6.x Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Signed-off-by: Stefano Brivio Tested-By: Nikita Yushchenko Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_pipapo_avx2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index e517663e0cd1..6f4116e72958 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -886,7 +886,7 @@ static int nft_pipapo_avx2_lookup_8b_6(unsigned long *map, unsigned long *fill, NFT_PIPAPO_AVX2_BUCKET_LOAD8(4, lt, 4, pkt[4], bsize); NFT_PIPAPO_AVX2_AND(5, 0, 1); - NFT_PIPAPO_AVX2_BUCKET_LOAD8(6, lt, 6, pkt[5], bsize); + NFT_PIPAPO_AVX2_BUCKET_LOAD8(6, lt, 5, pkt[5], bsize); NFT_PIPAPO_AVX2_AND(7, 2, 3); /* Stall */ -- cgit From 0de53b0ffb5b22b52c1e0bd4d9e18cbbce5801d0 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Sat, 27 Nov 2021 11:33:38 +0100 Subject: selftests: netfilter: Add correctness test for mac,net set type The existing net,mac test didn't cover the issue recently reported by Nikita Yushchenko, where MAC addresses wouldn't match if given as first field of a concatenated set with AVX2 and 8-bit groups, because there's a different code path covering the lookup of six 8-bit groups (MAC addresses) if that's the first field. Add a similar mac,net test, with MAC address and IPv4 address swapped in the set specification. Signed-off-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- .../selftests/netfilter/nft_concat_range.sh | 24 +++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/netfilter/nft_concat_range.sh b/tools/testing/selftests/netfilter/nft_concat_range.sh index 5a4938d6dcf2..ed61f6cab60f 100755 --- a/tools/testing/selftests/netfilter/nft_concat_range.sh +++ b/tools/testing/selftests/netfilter/nft_concat_range.sh @@ -23,8 +23,8 @@ TESTS="reported_issues correctness concurrency timeout" # Set types, defined by TYPE_ variables below TYPES="net_port port_net net6_port port_proto net6_port_mac net6_port_mac_proto - net_port_net net_mac net_mac_icmp net6_mac_icmp net6_port_net6_port - net_port_mac_proto_net" + net_port_net net_mac mac_net net_mac_icmp net6_mac_icmp + net6_port_net6_port net_port_mac_proto_net" # Reported bugs, also described by TYPE_ variables below BUGS="flush_remove_add" @@ -277,6 +277,23 @@ perf_entries 1000 perf_proto ipv4 " +TYPE_mac_net=" +display mac,net +type_spec ether_addr . ipv4_addr +chain_spec ether saddr . ip saddr +dst +src mac addr4 +start 1 +count 5 +src_delta 2000 +tools sendip nc bash +proto udp + +race_repeat 0 + +perf_duration 0 +" + TYPE_net_mac_icmp=" display net,mac - ICMP type_spec ipv4_addr . ether_addr @@ -984,7 +1001,8 @@ format() { fi done for f in ${src}; do - __expr="${__expr} . " + [ "${__expr}" != "{ " ] && __expr="${__expr} . " + __start="$(eval format_"${f}" "${srcstart}")" __end="$(eval format_"${f}" "${srcend}")" -- cgit From 962e5a40358787105f126ab1dc01604da3d169e9 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 30 Nov 2021 11:34:04 +0100 Subject: netfilter: nft_exthdr: break evaluation if setting TCP option fails Break rule evaluation on malformed TCP options. Fixes: 99d1712bc41c ("netfilter: exthdr: tcp option set support") Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_exthdr.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index af4ee874a067..dbe1f2e7dd9e 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -236,7 +236,7 @@ static void nft_exthdr_tcp_set_eval(const struct nft_expr *expr, tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff, &tcphdr_len); if (!tcph) - return; + goto err; opt = (u8 *)tcph; for (i = sizeof(*tcph); i < tcphdr_len - 1; i += optl) { @@ -251,16 +251,16 @@ static void nft_exthdr_tcp_set_eval(const struct nft_expr *expr, continue; if (i + optl > tcphdr_len || priv->len + priv->offset > optl) - return; + goto err; if (skb_ensure_writable(pkt->skb, nft_thoff(pkt) + i + priv->len)) - return; + goto err; tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff, &tcphdr_len); if (!tcph) - return; + goto err; offset = i + priv->offset; @@ -303,6 +303,9 @@ static void nft_exthdr_tcp_set_eval(const struct nft_expr *expr, return; } + return; +err: + regs->verdict.code = NFT_BREAK; } static void nft_exthdr_sctp_eval(const struct nft_expr *expr, -- cgit From d46cea0e6933da93c5373a46e3dc7e5d0e56bedb Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 3 Dec 2021 15:33:23 +0100 Subject: selftests: netfilter: switch zone stress to socat centos9 has nmap-ncat which doesn't like the '-q' option, use socat. While at it, mark test skipped if needed tools are missing. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- tools/testing/selftests/netfilter/nft_zones_many.sh | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/netfilter/nft_zones_many.sh b/tools/testing/selftests/netfilter/nft_zones_many.sh index ac646376eb01..04633119b29a 100755 --- a/tools/testing/selftests/netfilter/nft_zones_many.sh +++ b/tools/testing/selftests/netfilter/nft_zones_many.sh @@ -18,11 +18,17 @@ cleanup() ip netns del $ns } -ip netns add $ns -if [ $? -ne 0 ];then - echo "SKIP: Could not create net namespace $gw" - exit $ksft_skip -fi +checktool (){ + if ! $1 > /dev/null 2>&1; then + echo "SKIP: Could not $2" + exit $ksft_skip + fi +} + +checktool "nft --version" "run test without nft tool" +checktool "ip -Version" "run test without ip tool" +checktool "socat -V" "run test without socat tool" +checktool "ip netns add $ns" "create net namespace" trap cleanup EXIT @@ -71,7 +77,8 @@ EOF local start=$(date +%s%3N) i=$((i + 10000)) j=$((j + 1)) - dd if=/dev/zero of=/dev/stdout bs=8k count=10000 2>/dev/null | ip netns exec "$ns" nc -w 1 -q 1 -u -p 12345 127.0.0.1 12345 > /dev/null + # nft rule in output places each packet in a different zone. + dd if=/dev/zero of=/dev/stdout bs=8k count=10000 2>/dev/null | ip netns exec "$ns" socat STDIN UDP:127.0.0.1:12345,sourceport=12345 if [ $? -ne 0 ] ;then ret=1 break -- cgit From 802a7dc5cf1bef06f7b290ce76d478138408d6b1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 7 Dec 2021 10:03:23 -0800 Subject: netfilter: conntrack: annotate data-races around ct->timeout (struct nf_conn)->timeout can be read/written locklessly, add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing. BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0: __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline] tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864 tcp_push_pending_frames include/net/tcp.h:1897 [inline] tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] __sys_sendto+0x21e/0x2c0 net/socket.c:2036 __do_sys_sendto net/socket.c:2048 [inline] __se_sys_sendto net/socket.c:2044 [inline] __x64_sys_sendto+0x74/0x90 net/socket.c:2044 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1: nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline] ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline] __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline] tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367 rds_send_worker+0x43/0x200 net/rds/threads.c:200 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298 worker_thread+0x616/0xa70 kernel/workqueue.c:2445 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x00027cc2 -> 0x00000000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G W 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: krdsd rds_send_worker Note: I chose an arbitrary commit for the Fixes: tag, because I do not think we need to backport this fix to very old kernels. Fixes: e37542ba111f ("netfilter: conntrack: avoid possible false sharing") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_conntrack.h | 6 +++--- net/netfilter/nf_conntrack_core.c | 6 +++--- net/netfilter/nf_conntrack_netlink.c | 2 +- net/netfilter/nf_flow_table_core.c | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index cc663c68ddc4..d24b0a34c8f0 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -276,14 +276,14 @@ static inline bool nf_is_loopback_packet(const struct sk_buff *skb) /* jiffies until ct expires, 0 if already expired */ static inline unsigned long nf_ct_expires(const struct nf_conn *ct) { - s32 timeout = ct->timeout - nfct_time_stamp; + s32 timeout = READ_ONCE(ct->timeout) - nfct_time_stamp; return timeout > 0 ? timeout : 0; } static inline bool nf_ct_is_expired(const struct nf_conn *ct) { - return (__s32)(ct->timeout - nfct_time_stamp) <= 0; + return (__s32)(READ_ONCE(ct->timeout) - nfct_time_stamp) <= 0; } /* use after obtaining a reference count */ @@ -302,7 +302,7 @@ static inline bool nf_ct_should_gc(const struct nf_conn *ct) static inline void nf_ct_offload_timeout(struct nf_conn *ct) { if (nf_ct_expires(ct) < NF_CT_DAY / 2) - ct->timeout = nfct_time_stamp + NF_CT_DAY; + WRITE_ONCE(ct->timeout, nfct_time_stamp + NF_CT_DAY); } struct kernel_param; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 770a63103c7a..4712a90a1820 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -684,7 +684,7 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report) tstamp = nf_conn_tstamp_find(ct); if (tstamp) { - s32 timeout = ct->timeout - nfct_time_stamp; + s32 timeout = READ_ONCE(ct->timeout) - nfct_time_stamp; tstamp->stop = ktime_get_real_ns(); if (timeout < 0) @@ -1036,7 +1036,7 @@ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx) } /* We want the clashing entry to go away real soon: 1 second timeout. */ - loser_ct->timeout = nfct_time_stamp + HZ; + WRITE_ONCE(loser_ct->timeout, nfct_time_stamp + HZ); /* IPS_NAT_CLASH removes the entry automatically on the first * reply. Also prevents UDP tracker from moving the entry to @@ -1560,7 +1560,7 @@ __nf_conntrack_alloc(struct net *net, /* save hash for reusing when confirming */ *(unsigned long *)(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev) = hash; ct->status = 0; - ct->timeout = 0; + WRITE_ONCE(ct->timeout, 0); write_pnet(&ct->ct_net, net); memset(&ct->__nfct_init_offset, 0, offsetof(struct nf_conn, proto) - diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index c7708bde057c..81d03acf68d4 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1998,7 +1998,7 @@ static int ctnetlink_change_timeout(struct nf_conn *ct, if (timeout > INT_MAX) timeout = INT_MAX; - ct->timeout = nfct_time_stamp + (u32)timeout; + WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout); if (test_bit(IPS_DYING_BIT, &ct->status)) return -ETIME; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 87a7388b6c89..ed37bb9b4e58 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -201,8 +201,8 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct) if (timeout < 0) timeout = 0; - if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout) - ct->timeout = nfct_time_stamp + timeout; + if (nf_flow_timeout_delta(READ_ONCE(ct->timeout)) > (__s32)timeout) + WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout); } static void flow_offload_fixup_ct_state(struct nf_conn *ct) -- cgit From d76c51f976ed1095dcd8c5c85ec9d8fed77a3e05 Mon Sep 17 00:00:00 2001 From: Vadim Fedorenko Date: Tue, 7 Dec 2021 00:39:31 +0300 Subject: selftests: tls: add missing AES-CCM cipher tests Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher. Signed-off-by: Vadim Fedorenko Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/tls.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 8a22db0cca49..fb1bb402ee10 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -31,6 +31,7 @@ struct tls_crypto_info_keys { struct tls12_crypto_info_chacha20_poly1305 chacha20; struct tls12_crypto_info_sm4_gcm sm4gcm; struct tls12_crypto_info_sm4_ccm sm4ccm; + struct tls12_crypto_info_aes_ccm_128 aesccm128; }; size_t len; }; @@ -61,6 +62,11 @@ static void tls_crypto_info_init(uint16_t tls_version, uint16_t cipher_type, tls12->sm4ccm.info.version = tls_version; tls12->sm4ccm.info.cipher_type = cipher_type; break; + case TLS_CIPHER_AES_CCM_128: + tls12->len = sizeof(struct tls12_crypto_info_aes_ccm_128); + tls12->aesccm128.info.version = tls_version; + tls12->aesccm128.info.cipher_type = cipher_type; + break; default: break; } @@ -261,6 +267,18 @@ FIXTURE_VARIANT_ADD(tls, 13_sm4_ccm) .cipher_type = TLS_CIPHER_SM4_CCM, }; +FIXTURE_VARIANT_ADD(tls, 12_aes_ccm) +{ + .tls_version = TLS_1_2_VERSION, + .cipher_type = TLS_CIPHER_AES_CCM_128, +}; + +FIXTURE_VARIANT_ADD(tls, 13_aes_ccm) +{ + .tls_version = TLS_1_3_VERSION, + .cipher_type = TLS_CIPHER_AES_CCM_128, +}; + FIXTURE_SETUP(tls) { struct tls_crypto_info_keys tls12; -- cgit From 13bf99ab2130783e2b1988ef415585e3af7df97b Mon Sep 17 00:00:00 2001 From: Vadim Fedorenko Date: Tue, 7 Dec 2021 00:39:32 +0300 Subject: selftests: tls: add missing AES256-GCM cipher Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher Signed-off-by: Vadim Fedorenko Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/tls.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index fb1bb402ee10..6e468e0f42f7 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -32,6 +32,7 @@ struct tls_crypto_info_keys { struct tls12_crypto_info_sm4_gcm sm4gcm; struct tls12_crypto_info_sm4_ccm sm4ccm; struct tls12_crypto_info_aes_ccm_128 aesccm128; + struct tls12_crypto_info_aes_gcm_256 aesgcm256; }; size_t len; }; @@ -67,6 +68,11 @@ static void tls_crypto_info_init(uint16_t tls_version, uint16_t cipher_type, tls12->aesccm128.info.version = tls_version; tls12->aesccm128.info.cipher_type = cipher_type; break; + case TLS_CIPHER_AES_GCM_256: + tls12->len = sizeof(struct tls12_crypto_info_aes_gcm_256); + tls12->aesgcm256.info.version = tls_version; + tls12->aesgcm256.info.cipher_type = cipher_type; + break; default: break; } @@ -279,6 +285,18 @@ FIXTURE_VARIANT_ADD(tls, 13_aes_ccm) .cipher_type = TLS_CIPHER_AES_CCM_128, }; +FIXTURE_VARIANT_ADD(tls, 12_aes_gcm_256) +{ + .tls_version = TLS_1_2_VERSION, + .cipher_type = TLS_CIPHER_AES_GCM_256, +}; + +FIXTURE_VARIANT_ADD(tls, 13_aes_gcm_256) +{ + .tls_version = TLS_1_3_VERSION, + .cipher_type = TLS_CIPHER_AES_GCM_256, +}; + FIXTURE_SETUP(tls) { struct tls_crypto_info_keys tls12; -- cgit From 6ebe4b350833a535d1df02591911b5f5fba3b275 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Mon, 6 Dec 2021 17:17:23 +0100 Subject: MAINTAINERS: net: mlxsw: Remove Jiri as a maintainer, add myself Jiri has moved on and will not carry out the mlxsw maintainership duty any longer. Add myself as a co-maintainer instead. Signed-off-by: Petr Machata Acked-by: Jiri Pirko Reviewed-by: Ido Schimmel Link: https://lore.kernel.org/r/45b54312cdebaf65c5d110b15a5dd2df795bf2be.1638807297.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index faa9c34d837d..7e51081b6708 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12180,8 +12180,8 @@ F: drivers/net/ethernet/mellanox/mlx5/core/fpga/* F: include/linux/mlx5/mlx5_ifc_fpga.h MELLANOX ETHERNET SWITCH DRIVERS -M: Jiri Pirko M: Ido Schimmel +M: Petr Machata L: netdev@vger.kernel.org S: Supported W: http://www.mellanox.com -- cgit From e6f60c51f0435862020bcd2d1e3257caaafe5650 Mon Sep 17 00:00:00 2001 From: Ameer Hamza Date: Sun, 5 Dec 2021 23:38:10 +0500 Subject: gve: fix for null pointer dereference. Avoid passing NULL skb to __skb_put() function call if napi_alloc_skb() returns NULL. Fixes: 37149e9374bf ("gve: Implement packet continuation for RX.") Signed-off-by: Ameer Hamza Link: https://lore.kernel.org/r/20211205183810.8299-1-amhamza.mgc@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/google/gve/gve_utils.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/google/gve/gve_utils.c b/drivers/net/ethernet/google/gve/gve_utils.c index 88ca49cbc1e2..d57508bc4307 100644 --- a/drivers/net/ethernet/google/gve/gve_utils.c +++ b/drivers/net/ethernet/google/gve/gve_utils.c @@ -68,6 +68,9 @@ struct sk_buff *gve_rx_copy(struct net_device *dev, struct napi_struct *napi, set_protocol = ctx->curr_frag_cnt == ctx->expected_frag_cnt - 1; } else { skb = napi_alloc_skb(napi, len); + + if (unlikely(!skb)) + return NULL; set_protocol = true; } __skb_put(skb, len); -- cgit From a97770cc4016c2733bcef9dbe3d5b1ad02d13356 Mon Sep 17 00:00:00 2001 From: Yanteng Si Date: Mon, 6 Dec 2021 16:12:27 +0800 Subject: net: phy: Remove unnecessary indentation in the comments of phy_device Fix warning as: linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation. linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent. linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation. Suggested-by: Akira Yokosawa Signed-off-by: Yanteng Si Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/include/linux/phy.h b/include/linux/phy.h index 96e43fbb2dd8..cbf03a5f9cf5 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -538,11 +538,12 @@ struct macsec_ops; * @mac_managed_pm: Set true if MAC driver takes of suspending/resuming PHY * @state: State of the PHY for management purposes * @dev_flags: Device-specific flags used by the PHY driver. - * Bits [15:0] are free to use by the PHY driver to communicate - * driver specific behavior. - * Bits [23:16] are currently reserved for future use. - * Bits [31:24] are reserved for defining generic - * PHY driver behavior. + * + * - Bits [15:0] are free to use by the PHY driver to communicate + * driver specific behavior. + * - Bits [23:16] are currently reserved for future use. + * - Bits [31:24] are reserved for defining generic + * PHY driver behavior. * @irq: IRQ number of the PHY's interrupt (-1 if none) * @phy_timer: The timer for handling the state machine * @phylink: Pointer to phylink instance for this PHY -- cgit From c35e8de704560846dab964a2df2e548818a424d3 Mon Sep 17 00:00:00 2001 From: Yanteng Si Date: Mon, 6 Dec 2021 16:12:28 +0800 Subject: net: phy: Add the missing blank line in the phylink_suspend comment Fix warning as: Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1657: WARNING: Unexpected indentation. Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1658: WARNING: Block quote ends without a blank line; unexpected unindent. Signed-off-by: Yanteng Si Signed-off-by: Jakub Kicinski --- drivers/net/phy/phylink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 5904546acae6..ea82ea5660e7 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -1388,6 +1388,7 @@ EXPORT_SYMBOL_GPL(phylink_stop); * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan * * Handle a network device suspend event. There are several cases: + * * - If Wake-on-Lan is not active, we can bring down the link between * the MAC and PHY by calling phylink_stop(). * - If Wake-on-Lan is active, and being handled only by the PHY, we -- cgit From b5bd95d17102b6719e3531d627875b9690371383 Mon Sep 17 00:00:00 2001 From: Joakim Zhang Date: Mon, 6 Dec 2021 21:54:57 +0800 Subject: net: fec: only clear interrupt of handling queue in fec_enet_rx_queue() Background: We have a customer is running a Profinet stack on the 8MM which receives and responds PNIO packets every 4ms and PNIO-CM packets every 40ms. However, from time to time the received PNIO-CM package is "stock" and is only handled when receiving a new PNIO-CM or DCERPC-Ping packet (tcpdump shows the PNIO-CM and the DCERPC-Ping packet at the same time but the PNIO-CM HW timestamp is from the expected 40 ms and not the 2s delay of the DCERPC-Ping). After debugging, we noticed PNIO, PNIO-CM and DCERPC-Ping packets would be handled by different RX queues. The root cause should be driver ack all queues' interrupt when handle a specific queue in fec_enet_rx_queue(). The blamed patch is introduced to receive as much packets as possible once to avoid interrupt flooding. But it's unreasonable to clear other queues'interrupt when handling one queue, this patch tries to fix it. Fixes: ed63f1dcd578 (net: fec: clear receive interrupts before processing a packet) Cc: Russell King Reported-by: Nicolas Diaz Signed-off-by: Joakim Zhang Link: https://lore.kernel.org/r/20211206135457.15946-1-qiangqing.zhang@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/freescale/fec.h | 3 +++ drivers/net/ethernet/freescale/fec_main.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 7b4961daa254..ed7301b69169 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -377,6 +377,9 @@ struct bufdesc_ex { #define FEC_ENET_WAKEUP ((uint)0x00020000) /* Wakeup request */ #define FEC_ENET_TXF (FEC_ENET_TXF_0 | FEC_ENET_TXF_1 | FEC_ENET_TXF_2) #define FEC_ENET_RXF (FEC_ENET_RXF_0 | FEC_ENET_RXF_1 | FEC_ENET_RXF_2) +#define FEC_ENET_RXF_GET(X) (((X) == 0) ? FEC_ENET_RXF_0 : \ + (((X) == 1) ? FEC_ENET_RXF_1 : \ + FEC_ENET_RXF_2)) #define FEC_ENET_TS_AVAIL ((uint)0x00010000) #define FEC_ENET_TS_TIMER ((uint)0x00008000) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index bc418b910999..1b1f7f2a6130 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -1480,7 +1480,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) break; pkt_received++; - writel(FEC_ENET_RXF, fep->hwp + FEC_IEVENT); + writel(FEC_ENET_RXF_GET(queue_id), fep->hwp + FEC_IEVENT); /* Check for errors. */ status ^= BD_ENET_RX_LAST; -- cgit From b560b21f71eb4ef9dfc7c8ec1d0e4d7f9aa54b51 Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Tue, 7 Dec 2021 10:15:21 +0200 Subject: bpf: Add selftests to cover packet access corner cases This commit adds BPF verifier selftests that cover all corner cases by packet boundary checks. Specifically, 8-byte packet reads are tested at the beginning of data and at the beginning of data_meta, using all kinds of boundary checks (all comparison operators: <, >, <=, >=; both permutations of operands: data + length compared to end, end compared to data + length). For each case there are three tests: 1. Length is just enough for an 8-byte read. Length is either 7 or 8, depending on the comparison. 2. Length is increased by 1 - should still pass the verifier. These cases are useful, because they failed before commit 2fa7d94afc1a ("bpf: Fix the off-by-two error in range markings"). 3. Length is decreased by 1 - should be rejected by the verifier. Some existing tests are just renamed to avoid duplication. Signed-off-by: Maxim Mikityanskiy Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20211207081521.41923-1-maximmi@nvidia.com --- .../bpf/verifier/xdp_direct_packet_access.c | 600 ++++++++++++++++++++- 1 file changed, 584 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c b/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c index de172a5b8754..b4ec228eb95d 100644 --- a/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c +++ b/tools/testing/selftests/bpf/verifier/xdp_direct_packet_access.c @@ -35,7 +35,7 @@ .prog_type = BPF_PROG_TYPE_XDP, }, { - "XDP pkt read, pkt_data' > pkt_end, good access", + "XDP pkt read, pkt_data' > pkt_end, corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -87,6 +87,41 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_data' > pkt_end, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data' > pkt_end, corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_end > pkt_data', good access", .insns = { @@ -106,7 +141,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_end > pkt_data', bad access 1", + "XDP pkt read, pkt_end > pkt_data', corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -142,6 +177,42 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_end > pkt_data', corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_end > pkt_data', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_data' < pkt_end, good access", .insns = { @@ -161,7 +232,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data' < pkt_end, bad access 1", + "XDP pkt read, pkt_data' < pkt_end, corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -198,7 +269,43 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_end < pkt_data', good access", + "XDP pkt read, pkt_data' < pkt_end, corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data' < pkt_end, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_end < pkt_data', corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -250,6 +357,41 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_end < pkt_data', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JLT, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_end < pkt_data', corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLT, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_data' >= pkt_end, good access", .insns = { @@ -268,7 +410,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data' >= pkt_end, bad access 1", + "XDP pkt read, pkt_data' >= pkt_end, corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -304,7 +446,41 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_end >= pkt_data', good access", + "XDP pkt read, pkt_data' >= pkt_end, corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data' >= pkt_end, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_end >= pkt_data', corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -359,7 +535,44 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data' <= pkt_end, good access", + "XDP pkt read, pkt_end >= pkt_data', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_end >= pkt_data', corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data' <= pkt_end, corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -413,6 +626,43 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_data' <= pkt_end, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JLE, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data' <= pkt_end, corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLE, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_end <= pkt_data', good access", .insns = { @@ -431,7 +681,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_end <= pkt_data', bad access 1", + "XDP pkt read, pkt_end <= pkt_data', corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, @@ -467,7 +717,41 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_meta' > pkt_data, good access", + "XDP pkt read, pkt_end <= pkt_data', corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_end <= pkt_data', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_md, data_end)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_meta' > pkt_data, corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -519,6 +803,41 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_meta' > pkt_data, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_meta' > pkt_data, corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_data > pkt_meta', good access", .insns = { @@ -538,7 +857,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data > pkt_meta', bad access 1", + "XDP pkt read, pkt_data > pkt_meta', corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -574,6 +893,42 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_data > pkt_meta', corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data > pkt_meta', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_meta' < pkt_data, good access", .insns = { @@ -593,7 +948,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_meta' < pkt_data, bad access 1", + "XDP pkt read, pkt_meta' < pkt_data, corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -630,7 +985,43 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data < pkt_meta', good access", + "XDP pkt read, pkt_meta' < pkt_data, corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_meta' < pkt_data, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data < pkt_meta', corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -682,6 +1073,41 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_data < pkt_meta', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JLT, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data < pkt_meta', corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLT, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_meta' >= pkt_data, good access", .insns = { @@ -700,7 +1126,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_meta' >= pkt_data, bad access 1", + "XDP pkt read, pkt_meta' >= pkt_data, corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -736,7 +1162,41 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data >= pkt_meta', good access", + "XDP pkt read, pkt_meta' >= pkt_data, corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_meta' >= pkt_data, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data >= pkt_meta', corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -791,7 +1251,44 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_meta' <= pkt_data, good access", + "XDP pkt read, pkt_data >= pkt_meta', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data >= pkt_meta', corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_1, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_meta' <= pkt_data, corner case, good access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -845,6 +1342,43 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_meta' <= pkt_data, corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 9), + BPF_JMP_REG(BPF_JLE, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -9), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_meta' <= pkt_data, corner case -1, bad access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLE, BPF_REG_1, BPF_REG_3, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr = "R1 offset is outside of the packet", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, { "XDP pkt read, pkt_data <= pkt_meta', good access", .insns = { @@ -863,7 +1397,7 @@ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, { - "XDP pkt read, pkt_data <= pkt_meta', bad access 1", + "XDP pkt read, pkt_data <= pkt_meta', corner case -1, bad access", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct xdp_md, data_meta)), @@ -898,3 +1432,37 @@ .prog_type = BPF_PROG_TYPE_XDP, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, +{ + "XDP pkt read, pkt_data <= pkt_meta', corner case, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7), + BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -7), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, +{ + "XDP pkt read, pkt_data <= pkt_meta', corner case +1, good access", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_md, data_meta)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, +}, -- cgit From 1a0f25a52e08b1f67510cabbb44888d2b3c46359 Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Fri, 12 Nov 2021 17:06:02 -0800 Subject: ice: safer stats processing The driver was zeroing live stats that could be fetched by ndo_get_stats64 at any time. This could result in inconsistent statistics, and the telltale sign was when reading stats frequently from /proc/net/dev, the stats would go backwards. Fix by collecting stats into a local, and delaying when we write to the structure so it's not incremental. Fixes: fcea6f3da546 ("ice: Add stats and ethtool support") Signed-off-by: Jesse Brandeburg Tested-by: Gurucharan G Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index c6d6ce52e2ca..73c61cdb036f 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5930,14 +5930,15 @@ ice_fetch_u64_stats_per_ring(struct u64_stats_sync *syncp, struct ice_q_stats st /** * ice_update_vsi_tx_ring_stats - Update VSI Tx ring stats counters * @vsi: the VSI to be updated + * @vsi_stats: the stats struct to be updated * @rings: rings to work on * @count: number of rings */ static void -ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, struct ice_tx_ring **rings, - u16 count) +ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, + struct rtnl_link_stats64 *vsi_stats, + struct ice_tx_ring **rings, u16 count) { - struct rtnl_link_stats64 *vsi_stats = &vsi->net_stats; u16 i; for (i = 0; i < count; i++) { @@ -5961,15 +5962,13 @@ ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, struct ice_tx_ring **rings, */ static void ice_update_vsi_ring_stats(struct ice_vsi *vsi) { - struct rtnl_link_stats64 *vsi_stats = &vsi->net_stats; + struct rtnl_link_stats64 *vsi_stats; u64 pkts, bytes; int i; - /* reset netdev stats */ - vsi_stats->tx_packets = 0; - vsi_stats->tx_bytes = 0; - vsi_stats->rx_packets = 0; - vsi_stats->rx_bytes = 0; + vsi_stats = kzalloc(sizeof(*vsi_stats), GFP_ATOMIC); + if (!vsi_stats) + return; /* reset non-netdev (extended) stats */ vsi->tx_restart = 0; @@ -5981,7 +5980,8 @@ static void ice_update_vsi_ring_stats(struct ice_vsi *vsi) rcu_read_lock(); /* update Tx rings counters */ - ice_update_vsi_tx_ring_stats(vsi, vsi->tx_rings, vsi->num_txq); + ice_update_vsi_tx_ring_stats(vsi, vsi_stats, vsi->tx_rings, + vsi->num_txq); /* update Rx rings counters */ ice_for_each_rxq(vsi, i) { @@ -5996,10 +5996,17 @@ static void ice_update_vsi_ring_stats(struct ice_vsi *vsi) /* update XDP Tx rings counters */ if (ice_is_xdp_ena_vsi(vsi)) - ice_update_vsi_tx_ring_stats(vsi, vsi->xdp_rings, + ice_update_vsi_tx_ring_stats(vsi, vsi_stats, vsi->xdp_rings, vsi->num_xdp_txq); rcu_read_unlock(); + + vsi->net_stats.tx_packets = vsi_stats->tx_packets; + vsi->net_stats.tx_bytes = vsi_stats->tx_bytes; + vsi->net_stats.rx_packets = vsi_stats->rx_packets; + vsi->net_stats.rx_bytes = vsi_stats->rx_bytes; + + kfree(vsi_stats); } /** -- cgit From 2b29cb9e3f7f038c7f50ad2583b47caf5cb1eaf2 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Tue, 7 Dec 2021 10:32:43 +0000 Subject: net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's" This commit fixes a misunderstanding in commit 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's"). For Marvell DSA switches with the PHY_DETECT bit (for non-6250 family devices), controls whether the PPU polls the PHY to retrieve the link, speed, duplex and pause status to update the port configuration. This applies for both internal and external PHYs. For some switches such as 88E6352 and 88E6390X, PHY_DETECT has an additional function of enabling auto-media mode between the internal PHY and SERDES blocks depending on which first gains link. The original intention of commit 5d5b231da7ac (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down) was to allow this bit to be used to detect when this propagation is enabled, and allow software to update the port configuration. This has found to be necessary for some switches which do not automatically propagate status from the SERDES to the port, which includes the 88E6390. However, commit 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") breaks this assumption. Maarten Zanders has confirmed that the issue he was addressing was for an 88E6250 switch, which does not have a PHY_DETECT bit in bit 12, but instead a link status bit. Therefore, mv88e6xxx_port_ppu_updates() does not report correctly. This patch resolves the above issues by reverting Maarten's change and instead making mv88e6xxx_port_ppu_updates() indicate whether the port is internal for the 88E6250 family of switches. Yes, you're right, I'm targeting the 6250 family. And yes, your suggestion would solve my case and is a better implementation for the other devices (as far as I can see). Fixes: 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") Signed-off-by: Russell King Tested-by: Maarten Zanders Link: https://lore.kernel.org/r/E1muXm7-00EwJB-7n@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/dsa/mv88e6xxx/chip.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index f00cbf5753b9..9f675464efc3 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -471,6 +471,12 @@ static int mv88e6xxx_port_ppu_updates(struct mv88e6xxx_chip *chip, int port) u16 reg; int err; + /* The 88e6250 family does not have the PHY detect bit. Instead, + * report whether the port is internal. + */ + if (chip->info->family == MV88E6XXX_FAMILY_6250) + return port < chip->info->num_internal_phys; + err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, ®); if (err) { dev_err(chip->dev, @@ -752,11 +758,10 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, ops = chip->info->ops; mv88e6xxx_reg_lock(chip); - /* Internal PHYs propagate their configuration directly to the MAC. - * External PHYs depend on whether the PPU is enabled for this port. + /* Force the link down if we know the port may not be automatically + * updated by the switch or if we are using fixed-link mode. */ - if (((!mv88e6xxx_phy_is_internal(ds, port) && - !mv88e6xxx_port_ppu_updates(chip, port)) || + if ((!mv88e6xxx_port_ppu_updates(chip, port) || mode == MLO_AN_FIXED) && ops->port_sync_link) err = ops->port_sync_link(chip, port, mode, false); mv88e6xxx_reg_unlock(chip); @@ -779,11 +784,11 @@ static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port, ops = chip->info->ops; mv88e6xxx_reg_lock(chip); - /* Internal PHYs propagate their configuration directly to the MAC. - * External PHYs depend on whether the PPU is enabled for this port. + /* Configure and force the link up if we know that the port may not + * automatically updated by the switch or if we are using fixed-link + * mode. */ - if ((!mv88e6xxx_phy_is_internal(ds, port) && - !mv88e6xxx_port_ppu_updates(chip, port)) || + if (!mv88e6xxx_port_ppu_updates(chip, port) || mode == MLO_AN_FIXED) { /* FIXME: for an automedia port, should we force the link * down here - what if the link comes up due to "other" media -- cgit From e195e9b5dee6459d8c8e6a314cc71a644a0537fd Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 Dec 2021 08:53:29 -0800 Subject: net, neigh: clear whole pneigh_entry at alloc time Commit 2c611ad97a82 ("net, neigh: Extend neigh->flags to 32 bit to allow for extensions") enables a new KMSAM warning [1] I think the bug is actually older, because the following intruction only occurred if ndm->ndm_flags had NTF_PROXY set. pn->flags = ndm->ndm_flags; Let's clear all pneigh_entry fields at alloc time. [1] BUG: KMSAN: uninit-value in pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593 pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593 pneigh_dump_table net/core/neighbour.c:2715 [inline] neigh_dump_info+0x1e3f/0x2c60 net/core/neighbour.c:2832 netlink_dump+0xaca/0x16a0 net/netlink/af_netlink.c:2265 __netlink_dump_start+0xd1c/0xee0 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:254 [inline] rtnetlink_rcv_msg+0x181b/0x18c0 net/core/rtnetlink.c:5534 netlink_rcv_skb+0x447/0x800 net/netlink/af_netlink.c:2491 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5589 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x1095/0x1360 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x16f3/0x1870 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] sock_write_iter+0x594/0x690 net/socket.c:1057 call_write_iter include/linux/fs.h:2162 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0x1318/0x2030 fs/read_write.c:590 ksys_write+0x28c/0x520 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:524 [inline] slab_alloc_node mm/slub.c:3251 [inline] slab_alloc mm/slub.c:3259 [inline] __kmalloc+0xc3c/0x12d0 mm/slub.c:4437 kmalloc include/linux/slab.h:595 [inline] pneigh_lookup+0x60f/0xd70 net/core/neighbour.c:766 arp_req_set_public net/ipv4/arp.c:1016 [inline] arp_req_set+0x430/0x10a0 net/ipv4/arp.c:1032 arp_ioctl+0x8d4/0xb60 net/ipv4/arp.c:1232 inet_ioctl+0x4ef/0x820 net/ipv4/af_inet.c:947 sock_do_ioctl net/socket.c:1118 [inline] sock_ioctl+0xa3f/0x13e0 net/socket.c:1235 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl+0x2df/0x4a0 fs/ioctl.c:860 __x64_sys_ioctl+0xd8/0x110 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae CPU: 1 PID: 20001 Comm: syz-executor.0 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 62dd93181aaa ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.") Signed-off-by: Eric Dumazet Cc: Roopa Prabhu Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20211206165329.1049835-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- net/core/neighbour.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 72ba027c34cf..dda12fbd177b 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -763,11 +763,10 @@ struct pneigh_entry * pneigh_lookup(struct neigh_table *tbl, ASSERT_RTNL(); - n = kmalloc(sizeof(*n) + key_len, GFP_KERNEL); + n = kzalloc(sizeof(*n) + key_len, GFP_KERNEL); if (!n) goto out; - n->protocol = 0; write_pnet(&n->net, net); memcpy(n->key, pkey, key_len); n->dev = dev; -- cgit From f71ef02f1a4a3c49962fa341ad8de19071f0f9bf Mon Sep 17 00:00:00 2001 From: Ronak Doshi Date: Tue, 7 Dec 2021 00:17:37 -0800 Subject: vmxnet3: fix minimum vectors alloc issue 'Commit 39f9895a00f4 ("vmxnet3: add support for 32 Tx/Rx queues")' added support for 32Tx/Rx queues. Within that patch, value of VMXNET3_LINUX_MIN_MSIX_VECT was updated. However, there is a case (numvcpus = 2) which actually requires 3 intrs which matches VMXNET3_LINUX_MIN_MSIX_VECT which then is treated as failure by stack to allocate more vectors. This patch fixes this issue. Fixes: 39f9895a00f4 ("vmxnet3: add support for 32 Tx/Rx queues") Signed-off-by: Ronak Doshi Acked-by: Guolin Yang Link: https://lore.kernel.org/r/20211207081737.14000-1-doshir@vmware.com Signed-off-by: Jakub Kicinski --- drivers/net/vmxnet3/vmxnet3_drv.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 14fae317bc70..fd407c0e2856 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -3261,7 +3261,7 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter) #ifdef CONFIG_PCI_MSI if (adapter->intr.type == VMXNET3_IT_MSIX) { - int i, nvec; + int i, nvec, nvec_allocated; nvec = adapter->share_intr == VMXNET3_INTR_TXSHARE ? 1 : adapter->num_tx_queues; @@ -3274,14 +3274,15 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter) for (i = 0; i < nvec; i++) adapter->intr.msix_entries[i].entry = i; - nvec = vmxnet3_acquire_msix_vectors(adapter, nvec); - if (nvec < 0) + nvec_allocated = vmxnet3_acquire_msix_vectors(adapter, nvec); + if (nvec_allocated < 0) goto msix_err; /* If we cannot allocate one MSIx vector per queue * then limit the number of rx queues to 1 */ - if (nvec == VMXNET3_LINUX_MIN_MSIX_VECT) { + if (nvec_allocated == VMXNET3_LINUX_MIN_MSIX_VECT && + nvec != VMXNET3_LINUX_MIN_MSIX_VECT) { if (adapter->share_intr != VMXNET3_INTR_BUDDYSHARE || adapter->num_rx_queues != 1) { adapter->share_intr = VMXNET3_INTR_TXSHARE; @@ -3291,14 +3292,14 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter) } } - adapter->intr.num_intrs = nvec; + adapter->intr.num_intrs = nvec_allocated; return; msix_err: /* If we cannot allocate MSIx vectors use only one rx queue */ dev_info(&adapter->pdev->dev, "Failed to enable MSI-X, error %d. " - "Limiting #rx queues to 1, try MSI.\n", nvec); + "Limiting #rx queues to 1, try MSI.\n", nvec_allocated); adapter->intr.type = VMXNET3_IT_MSI; } -- cgit From a50e659b2a1be14784e80f8492aab177e67c53a2 Mon Sep 17 00:00:00 2001 From: Louis Amas Date: Tue, 7 Dec 2021 15:34:22 +0100 Subject: net: mvpp2: fix XDP rx queues registering The registration of XDP queue information is incorrect because the RX queue id we use is invalid. When port->id == 0 it appears to works as expected yet it's no longer the case when port->id != 0. The problem arised while using a recent kernel version on the MACCHIATOBin. This board has several ports: * eth0 and eth1 are 10Gbps interfaces ; both ports has port->id == 0; * eth2 is a 1Gbps interface with port->id != 0. Code from xdp-tutorial (more specifically advanced03-AF_XDP) was used to test packet capture and injection on all these interfaces. The XDP kernel was simplified to: SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) { int index = ctx->rx_queue_index; /* A set entry here means that the correspnding queue_id * has an active AF_XDP socket bound to it. */ if (bpf_map_lookup_elem(&xsks_map, &index)) return bpf_redirect_map(&xsks_map, index, 0); return XDP_PASS; } Starting the program using: ./af_xdp_user -d DEV Gives the following result: * eth0 : ok * eth1 : ok * eth2 : no capture, no injection Investigating the issue shows that XDP rx queues for eth2 are wrong: XDP expects their id to be in the range [0..3] but we found them to be in the range [32..35]. Trying to force rx queue ids using: ./af_xdp_user -d eth2 -Q 32 fails as expected (we shall not have more than 4 queues). When we register the XDP rx queue information (using xdp_rxq_info_reg() in function mvpp2_rxq_init()) we tell it to use rxq->id as the queue id. This value is computed as: rxq->id = port->id * max_rxq_count + queue_id where max_rxq_count depends on the device version. In the MACCHIATOBin case, this value is 32, meaning that rx queues on eth2 are numbered from 32 to 35 - there are four of them. Clearly, this is not the per-port queue id that XDP is expecting: it wants a value in the range [0..3]. It shall directly use queue_id which is stored in rxq->logic_rxq -- so let's use that value instead. rxq->id is left untouched ; its value is indeed valid but it should not be used in this context. This is consistent with the remaining part of the code in mvpp2_rxq_init(). With this change, packet capture is working as expected on all the MACCHIATOBin ports. Fixes: b27db2274ba8 ("mvpp2: use page_pool allocator") Signed-off-by: Louis Amas Signed-off-by: Emmanuel Deloget Reviewed-by: Marcin Wojtas Acked-by: John Fastabend Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20211207143423.916334-1-louis.amas@eho.link Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 6480696c979b..6da8a595026b 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -2960,11 +2960,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port, mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size); if (priv->percpu_pools) { - err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0); + err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->logic_rxq, 0); if (err < 0) goto err_free_dma; - err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0); + err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->logic_rxq, 0); if (err < 0) goto err_unregister_rxq_short; -- cgit From 36aea60fc892ce73f96d45dc7eb239c7c4c1fa69 Mon Sep 17 00:00:00 2001 From: Jimmy Assarsson Date: Wed, 8 Dec 2021 16:21:21 +0100 Subject: can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter Check the direction bit in the error frame packet (EPACK) to determine which net_device_stats {rx,tx}_errors counter to increase. Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices") Link: https://lore.kernel.org/all/20211208152122.250852-1-extja@kvaser.com Cc: stable@vger.kernel.org Signed-off-by: Jimmy Assarsson Signed-off-by: Marc Kleine-Budde --- drivers/net/can/kvaser_pciefd.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/kvaser_pciefd.c b/drivers/net/can/kvaser_pciefd.c index 74d9899fc904..eb74cdf26b88 100644 --- a/drivers/net/can/kvaser_pciefd.c +++ b/drivers/net/can/kvaser_pciefd.c @@ -248,6 +248,9 @@ MODULE_DESCRIPTION("CAN driver for Kvaser CAN/PCIe devices"); #define KVASER_PCIEFD_SPACK_EWLR BIT(23) #define KVASER_PCIEFD_SPACK_EPLR BIT(24) +/* Kvaser KCAN_EPACK second word */ +#define KVASER_PCIEFD_EPACK_DIR_TX BIT(0) + struct kvaser_pciefd; struct kvaser_pciefd_can { @@ -1285,7 +1288,10 @@ static int kvaser_pciefd_rx_error_frame(struct kvaser_pciefd_can *can, can->err_rep_cnt++; can->can.can_stats.bus_error++; - stats->rx_errors++; + if (p->header[1] & KVASER_PCIEFD_EPACK_DIR_TX) + stats->tx_errors++; + else + stats->rx_errors++; can->bec.txerr = bec.txerr; can->bec.rxerr = bec.rxerr; -- cgit From fb12797ab1fef480ad8a32a30984844444eeb00d Mon Sep 17 00:00:00 2001 From: Jimmy Assarsson Date: Wed, 8 Dec 2021 16:21:22 +0100 Subject: can: kvaser_usb: get CAN clock frequency from device The CAN clock frequency is used when calculating the CAN bittiming parameters. When wrong clock frequency is used, the device may end up with wrong bittiming parameters, depending on user requested bittiming parameters. To avoid this, get the CAN clock frequency from the device. Various existing Kvaser Leaf products use different CAN clocks. Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Link: https://lore.kernel.org/all/20211208152122.250852-2-extja@kvaser.com Cc: stable@vger.kernel.org Signed-off-by: Jimmy Assarsson Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c | 101 ++++++++++++++++------- 1 file changed, 73 insertions(+), 28 deletions(-) diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c index 59ba7c7beec0..f7af1bf5ab46 100644 --- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c +++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c @@ -28,10 +28,6 @@ #include "kvaser_usb.h" -/* Forward declaration */ -static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg; - -#define CAN_USB_CLOCK 8000000 #define MAX_USBCAN_NET_DEVICES 2 /* Command header size */ @@ -80,6 +76,12 @@ static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg; #define CMD_LEAF_LOG_MESSAGE 106 +/* Leaf frequency options */ +#define KVASER_USB_LEAF_SWOPTION_FREQ_MASK 0x60 +#define KVASER_USB_LEAF_SWOPTION_FREQ_16_MHZ_CLK 0 +#define KVASER_USB_LEAF_SWOPTION_FREQ_32_MHZ_CLK BIT(5) +#define KVASER_USB_LEAF_SWOPTION_FREQ_24_MHZ_CLK BIT(6) + /* error factors */ #define M16C_EF_ACKE BIT(0) #define M16C_EF_CRCE BIT(1) @@ -340,6 +342,50 @@ struct kvaser_usb_err_summary { }; }; +static const struct can_bittiming_const kvaser_usb_leaf_bittiming_const = { + .name = "kvaser_usb", + .tseg1_min = KVASER_USB_TSEG1_MIN, + .tseg1_max = KVASER_USB_TSEG1_MAX, + .tseg2_min = KVASER_USB_TSEG2_MIN, + .tseg2_max = KVASER_USB_TSEG2_MAX, + .sjw_max = KVASER_USB_SJW_MAX, + .brp_min = KVASER_USB_BRP_MIN, + .brp_max = KVASER_USB_BRP_MAX, + .brp_inc = KVASER_USB_BRP_INC, +}; + +static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg_8mhz = { + .clock = { + .freq = 8000000, + }, + .timestamp_freq = 1, + .bittiming_const = &kvaser_usb_leaf_bittiming_const, +}; + +static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg_16mhz = { + .clock = { + .freq = 16000000, + }, + .timestamp_freq = 1, + .bittiming_const = &kvaser_usb_leaf_bittiming_const, +}; + +static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg_24mhz = { + .clock = { + .freq = 24000000, + }, + .timestamp_freq = 1, + .bittiming_const = &kvaser_usb_leaf_bittiming_const, +}; + +static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg_32mhz = { + .clock = { + .freq = 32000000, + }, + .timestamp_freq = 1, + .bittiming_const = &kvaser_usb_leaf_bittiming_const, +}; + static void * kvaser_usb_leaf_frame_to_cmd(const struct kvaser_usb_net_priv *priv, const struct sk_buff *skb, int *frame_len, @@ -471,6 +517,27 @@ static int kvaser_usb_leaf_send_simple_cmd(const struct kvaser_usb *dev, return rc; } +static void kvaser_usb_leaf_get_software_info_leaf(struct kvaser_usb *dev, + const struct leaf_cmd_softinfo *softinfo) +{ + u32 sw_options = le32_to_cpu(softinfo->sw_options); + + dev->fw_version = le32_to_cpu(softinfo->fw_version); + dev->max_tx_urbs = le16_to_cpu(softinfo->max_outstanding_tx); + + switch (sw_options & KVASER_USB_LEAF_SWOPTION_FREQ_MASK) { + case KVASER_USB_LEAF_SWOPTION_FREQ_16_MHZ_CLK: + dev->cfg = &kvaser_usb_leaf_dev_cfg_16mhz; + break; + case KVASER_USB_LEAF_SWOPTION_FREQ_24_MHZ_CLK: + dev->cfg = &kvaser_usb_leaf_dev_cfg_24mhz; + break; + case KVASER_USB_LEAF_SWOPTION_FREQ_32_MHZ_CLK: + dev->cfg = &kvaser_usb_leaf_dev_cfg_32mhz; + break; + } +} + static int kvaser_usb_leaf_get_software_info_inner(struct kvaser_usb *dev) { struct kvaser_cmd cmd; @@ -486,14 +553,13 @@ static int kvaser_usb_leaf_get_software_info_inner(struct kvaser_usb *dev) switch (dev->card_data.leaf.family) { case KVASER_LEAF: - dev->fw_version = le32_to_cpu(cmd.u.leaf.softinfo.fw_version); - dev->max_tx_urbs = - le16_to_cpu(cmd.u.leaf.softinfo.max_outstanding_tx); + kvaser_usb_leaf_get_software_info_leaf(dev, &cmd.u.leaf.softinfo); break; case KVASER_USBCAN: dev->fw_version = le32_to_cpu(cmd.u.usbcan.softinfo.fw_version); dev->max_tx_urbs = le16_to_cpu(cmd.u.usbcan.softinfo.max_outstanding_tx); + dev->cfg = &kvaser_usb_leaf_dev_cfg_8mhz; break; } @@ -1225,24 +1291,11 @@ static int kvaser_usb_leaf_init_card(struct kvaser_usb *dev) { struct kvaser_usb_dev_card_data *card_data = &dev->card_data; - dev->cfg = &kvaser_usb_leaf_dev_cfg; card_data->ctrlmode_supported |= CAN_CTRLMODE_3_SAMPLES; return 0; } -static const struct can_bittiming_const kvaser_usb_leaf_bittiming_const = { - .name = "kvaser_usb", - .tseg1_min = KVASER_USB_TSEG1_MIN, - .tseg1_max = KVASER_USB_TSEG1_MAX, - .tseg2_min = KVASER_USB_TSEG2_MIN, - .tseg2_max = KVASER_USB_TSEG2_MAX, - .sjw_max = KVASER_USB_SJW_MAX, - .brp_min = KVASER_USB_BRP_MIN, - .brp_max = KVASER_USB_BRP_MAX, - .brp_inc = KVASER_USB_BRP_INC, -}; - static int kvaser_usb_leaf_set_bittiming(struct net_device *netdev) { struct kvaser_usb_net_priv *priv = netdev_priv(netdev); @@ -1348,11 +1401,3 @@ const struct kvaser_usb_dev_ops kvaser_usb_leaf_dev_ops = { .dev_read_bulk_callback = kvaser_usb_leaf_read_bulk_callback, .dev_frame_to_cmd = kvaser_usb_leaf_frame_to_cmd, }; - -static const struct kvaser_usb_dev_cfg kvaser_usb_leaf_dev_cfg = { - .clock = { - .freq = CAN_USB_CLOCK, - }, - .timestamp_freq = 1, - .bittiming_const = &kvaser_usb_leaf_bittiming_const, -}; -- cgit From 0416e7af2369b0d12a28dea8d30b104df9a6953d Mon Sep 17 00:00:00 2001 From: Ameer Hamza Date: Thu, 9 Dec 2021 09:15:52 +0500 Subject: net: dsa: mv88e6xxx: error handling for serdes_power functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added default case to handle undefined cmode scenario in mv88e6393x_serdes_power() and mv88e6393x_serdes_power() methods. Addresses-Coverity: 1494644 ("Uninitialized scalar variable") Fixes: 21635d9203e1c (net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X) Reviewed-by: Marek Behún Signed-off-by: Ameer Hamza Link: https://lore.kernel.org/r/20211209041552.9810-1-amhamza.mgc@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/mv88e6xxx/serdes.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c index 55273013bfb5..2b05ead515cd 100644 --- a/drivers/net/dsa/mv88e6xxx/serdes.c +++ b/drivers/net/dsa/mv88e6xxx/serdes.c @@ -830,7 +830,7 @@ int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, int lane, bool up) { u8 cmode = chip->ports[port].cmode; - int err = 0; + int err; switch (cmode) { case MV88E6XXX_PORT_STS_CMODE_SGMII: @@ -842,6 +842,9 @@ int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, int lane, case MV88E6XXX_PORT_STS_CMODE_RXAUI: err = mv88e6390_serdes_power_10g(chip, lane, up); break; + default: + err = -EINVAL; + break; } if (!err && up) @@ -1541,6 +1544,9 @@ int mv88e6393x_serdes_power(struct mv88e6xxx_chip *chip, int port, int lane, case MV88E6393X_PORT_STS_CMODE_10GBASER: err = mv88e6390_serdes_power_10g(chip, lane, on); break; + default: + err = -EINVAL; + break; } if (err) -- cgit From 158390e45612ef0fde160af0826f1740c36daf21 Mon Sep 17 00:00:00 2001 From: Jianguo Wu Date: Wed, 8 Dec 2021 18:03:33 +0800 Subject: udp: using datalen to cap max gso segments The max number of UDP gso segments is intended to cap to UDP_MAX_SEGMENTS, this is checked in udp_send_skb(): if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS) { kfree_skb(skb); return -EINVAL; } skb->len contains network and transport header len here, we should use only data len instead. Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT") Signed-off-by: Jianguo Wu Reviewed-by: Willem de Bruijn Link: https://lore.kernel.org/r/900742e5-81fb-30dc-6e0b-375c6cdd7982@163.com Signed-off-by: Jakub Kicinski --- net/ipv4/udp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 8bcecdd6aeda..23b05e28490b 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -916,7 +916,7 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, kfree_skb(skb); return -EINVAL; } - if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS) { + if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) { kfree_skb(skb); return -EINVAL; } -- cgit From fd79a0cbf0b2e34bcc45b13acf962e2032a82203 Mon Sep 17 00:00:00 2001 From: Tadeusz Struk Date: Wed, 8 Dec 2021 10:27:42 -0800 Subject: nfc: fix segfault in nfc_genl_dump_devices_done When kmalloc in nfc_genl_dump_devices() fails then nfc_genl_dump_devices_done() segfaults as below KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 25 Comm: kworker/0:1 Not tainted 5.16.0-rc4-01180-g2a987e65025e-dirty #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014 Workqueue: events netlink_sock_destruct_work RIP: 0010:klist_iter_exit+0x26/0x80 Call Trace: class_dev_iter_exit+0x15/0x20 nfc_genl_dump_devices_done+0x3b/0x50 genl_lock_done+0x84/0xd0 netlink_sock_destruct+0x8f/0x270 __sk_destruct+0x64/0x3b0 sk_destruct+0xa8/0xd0 __sk_free+0x2e8/0x3d0 sk_free+0x51/0x90 netlink_sock_destruct_work+0x1c/0x20 process_one_work+0x411/0x710 worker_thread+0x6fd/0xa80 Link: https://syzkaller.appspot.com/bug?id=fc0fa5a53db9edd261d56e74325419faf18bd0df Reported-by: syzbot+f9f76f4a0766420b4a02@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20211208182742.340542-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski --- net/nfc/netlink.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c index 334f63c9529e..0b4fae183a4b 100644 --- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -636,8 +636,10 @@ static int nfc_genl_dump_devices_done(struct netlink_callback *cb) { struct class_dev_iter *iter = (struct class_dev_iter *) cb->args[0]; - nfc_device_iter_exit(iter); - kfree(iter); + if (iter) { + nfc_device_iter_exit(iter); + kfree(iter); + } return 0; } -- cgit From 4cd8371a234d051f9c9557fcbb1f8c523b1c0d10 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Thu, 9 Dec 2021 09:13:07 +0100 Subject: nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done The done() netlink callback nfc_genl_dump_ses_done() should check if received argument is non-NULL, because its allocation could fail earlier in dumpit() (nfc_genl_dump_ses()). Fixes: ac22ac466a65 ("NFC: Add a GET_SE netlink API") Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20211209081307.57337-1-krzysztof.kozlowski@canonical.com Signed-off-by: Jakub Kicinski --- net/nfc/netlink.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c index 0b4fae183a4b..f184b0db79d4 100644 --- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -1394,8 +1394,10 @@ static int nfc_genl_dump_ses_done(struct netlink_callback *cb) { struct class_dev_iter *iter = (struct class_dev_iter *) cb->args[0]; - nfc_device_iter_exit(iter); - kfree(iter); + if (iter) { + nfc_device_iter_exit(iter); + kfree(iter); + } return 0; } -- cgit From c56c96303e9289cc34716b1179597b6f470833de Mon Sep 17 00:00:00 2001 From: Jianglei Nie Date: Thu, 9 Dec 2021 14:15:11 +0800 Subject: nfp: Fix memory leak in nfp_cpp_area_cache_add() In line 800 (#1), nfp_cpp_area_alloc() allocates and initializes a CPP area structure. But in line 807 (#2), when the cache is allocated failed, this CPP area structure is not freed, which will result in memory leak. We can fix it by freeing the CPP area when the cache is allocated failed (#2). 792 int nfp_cpp_area_cache_add(struct nfp_cpp *cpp, size_t size) 793 { 794 struct nfp_cpp_area_cache *cache; 795 struct nfp_cpp_area *area; 800 area = nfp_cpp_area_alloc(cpp, NFP_CPP_ID(7, NFP_CPP_ACTION_RW, 0), 801 0, size); // #1: allocates and initializes 802 if (!area) 803 return -ENOMEM; 805 cache = kzalloc(sizeof(*cache), GFP_KERNEL); 806 if (!cache) 807 return -ENOMEM; // #2: missing free 817 return 0; 818 } Fixes: 4cb584e0ee7d ("nfp: add CPP access core") Signed-off-by: Jianglei Nie Acked-by: Simon Horman Link: https://lore.kernel.org/r/20211209061511.122535-1-niejianglei2021@163.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c index d7ac0307797f..34c0d2ddf9ef 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c @@ -803,8 +803,10 @@ int nfp_cpp_area_cache_add(struct nfp_cpp *cpp, size_t size) return -ENOMEM; cache = kzalloc(sizeof(*cache), GFP_KERNEL); - if (!cache) + if (!cache) { + nfp_cpp_area_free(area); return -ENOMEM; + } cache->id = 0; cache->addr = 0; -- cgit From ae68d93354e5bf5191ee673982251864ea24dd5c Mon Sep 17 00:00:00 2001 From: Andrea Mayer Date: Wed, 8 Dec 2021 20:54:09 +0100 Subject: seg6: fix the iif in the IPv6 socket control block When an IPv4 packet is received, the ip_rcv_core(...) sets the receiving interface index into the IPv4 socket control block (v5.16-rc4, net/ipv4/ip_input.c line 510): IPCB(skb)->iif = skb->skb_iif; If that IPv4 packet is meant to be encapsulated in an outer IPv6+SRH header, the seg6_do_srh_encap(...) performs the required encapsulation. In this case, the seg6_do_srh_encap function clears the IPv6 socket control block (v5.16-rc4 net/ipv6/seg6_iptunnel.c line 163): memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); The memset(...) was introduced in commit ef489749aae5 ("ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation") a long time ago (2019-01-29). Since the IPv6 socket control block and the IPv4 socket control block share the same memory area (skb->cb), the receiving interface index info is lost (IP6CB(skb)->iif is set to zero). As a side effect, that condition triggers a NULL pointer dereference if commit 0857d6f8c759 ("ipv6: When forwarding count rx stats on the orig netdev") is applied. To fix that issue, we set the IP6CB(skb)->iif with the index of the receiving interface once again. Fixes: ef489749aae5 ("ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation") Signed-off-by: Andrea Mayer Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20211208195409.12169-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski --- net/ipv6/seg6_iptunnel.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index 3adc5d9211ad..d64855010948 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -161,6 +161,14 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb)); memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); + + /* the control block has been erased, so we have to set the + * iif once again. + * We read the receiving interface index directly from the + * skb->skb_iif as it is done in the IPv4 receiving path (i.e.: + * ip_rcv_core(...)). + */ + IP6CB(skb)->iif = skb->skb_iif; } hdr->nexthdr = NEXTHDR_ROUTING; -- cgit From 9acfc57fa2b8944ed079cedbf846823ea32b8a31 Mon Sep 17 00:00:00 2001 From: José Expósito Date: Wed, 8 Dec 2021 23:37:23 +0100 Subject: net: mana: Fix memory leak in mana_hwc_create_wq MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If allocating the DMA buffer fails, mana_hwc_destroy_wq was called without previously storing the pointer to the queue. In order to avoid leaking the pointer to the queue, store it as soon as it is allocated. Addresses-Coverity-ID: 1484720 ("Resource leak") Signed-off-by: José Expósito Reviewed-by: Dexuan Cui Link: https://lore.kernel.org/r/20211208223723.18520-1-jose.exposito89@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/microsoft/mana/hw_channel.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c index 34b971ff8ef8..078d6a5a0768 100644 --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c @@ -480,16 +480,16 @@ static int mana_hwc_create_wq(struct hw_channel_context *hwc, if (err) goto out; - err = mana_hwc_alloc_dma_buf(hwc, q_depth, max_msg_size, - &hwc_wq->msg_buf); - if (err) - goto out; - hwc_wq->hwc = hwc; hwc_wq->gdma_wq = queue; hwc_wq->queue_depth = q_depth; hwc_wq->hwc_cq = hwc_cq; + err = mana_hwc_alloc_dma_buf(hwc, q_depth, max_msg_size, + &hwc_wq->msg_buf); + if (err) + goto out; + *hwc_wq_ptr = hwc_wq; return 0; out: -- cgit From 61c2402665f1e10c5742033fce18392e369931d7 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 9 Dec 2021 00:49:37 -0800 Subject: net/sched: fq_pie: prevent dismantle issue For some reason, fq_pie_destroy() did not copy working code from pie_destroy() and other qdiscs, thus causing elusive bug. Before calling del_timer_sync(&q->adapt_timer), we need to ensure timer will not rearm itself. rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (4416 ticks this GP) idle=60d/1/0x4000000000000000 softirq=10433/10434 fqs=2579 (t=10501 jiffies g=13085 q=3989) NMI backtrace for cpu 0 CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3878 [inline] rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:write_comp_data kernel/kcov.c:221 [inline] RIP: 0010:__sanitizer_cov_trace_const_cmp1+0x1d/0x80 kernel/kcov.c:273 Code: 54 c8 20 48 89 10 c3 66 0f 1f 44 00 00 53 41 89 fb 41 89 f1 bf 03 00 00 00 65 48 8b 0c 25 40 70 02 00 48 89 ce 4c 8b 54 24 08 4e f7 ff ff 84 c0 74 51 48 8b 81 88 15 00 00 44 8b 81 84 15 00 RSP: 0018:ffffc90000d27b28 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff888064bf1bf0 RCX: ffff888011928000 RDX: ffff888011928000 RSI: ffff888011928000 RDI: 0000000000000003 RBP: ffff888064bf1c28 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff875d8295 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880783dd300 R14: 0000000000000000 R15: 0000000000000000 pie_calculate_probability+0x405/0x7c0 net/sched/sch_pie.c:418 fq_pie_timer+0x170/0x2a0 net/sched/sch_fq_pie.c:383 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734 __run_timers kernel/time/timer.c:1715 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 run_ksoftirqd kernel/softirq.c:921 [inline] run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Mohit P. Tahiliani Cc: Sachin D. Patil Cc: V. Saicharan Cc: Mohit Bhasi Cc: Leslie Monis Cc: Gautam Ramakrishnan Link: https://lore.kernel.org/r/20211209084937.3500020-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- net/sched/sch_fq_pie.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c index 830f3559f727..d6aba6edd16e 100644 --- a/net/sched/sch_fq_pie.c +++ b/net/sched/sch_fq_pie.c @@ -531,6 +531,7 @@ static void fq_pie_destroy(struct Qdisc *sch) struct fq_pie_sched_data *q = qdisc_priv(sch); tcf_block_put(q->block); + q->p_params.tupdate = 0; del_timer_sync(&q->adapt_timer); kvfree(q->flows); } -- cgit From 37ad4e2a77180841dffb1df64cdbd95541512d3d Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Thu, 9 Dec 2021 16:35:46 +0100 Subject: MAINTAINERS: s390/net: remove myself as maintainer I won't have access to the relevant HW and docs much longer. Signed-off-by: Julian Wiedmann Link: https://lore.kernel.org/r/20211209153546.1152921-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 2 -- 1 file changed, 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7e51081b6708..6dd20c31d875 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16623,7 +16623,6 @@ W: http://www.ibm.com/developerworks/linux/linux390/ F: drivers/iommu/s390-iommu.c S390 IUCV NETWORK LAYER -M: Julian Wiedmann M: Alexandra Winter M: Wenjia Zhang L: linux-s390@vger.kernel.org @@ -16635,7 +16634,6 @@ F: include/net/iucv/ F: net/iucv/ S390 NETWORK DRIVERS -M: Julian Wiedmann M: Alexandra Winter M: Wenjia Zhang L: linux-s390@vger.kernel.org -- cgit From e8b1d7698038e76363859fb47ae0a262080646f5 Mon Sep 17 00:00:00 2001 From: José Expósito Date: Thu, 9 Dec 2021 12:05:40 +0100 Subject: net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Avoid a memory leak if there is not a CPU port defined. Fixes: 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during init and teardown") Addresses-Coverity-ID: 1492897 ("Resource leak") Addresses-Coverity-ID: 1492899 ("Resource leak") Signed-off-by: José Expósito Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20211209110538.11585-1-jose.exposito89@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/ocelot/felix.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 327cc4654806..f1a05e7dc818 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -290,8 +290,11 @@ static int felix_setup_mmio_filtering(struct felix *felix) } } - if (cpu < 0) + if (cpu < 0) { + kfree(tagging_rule); + kfree(redirect_rule); return -EINVAL; + } tagging_rule->key_type = OCELOT_VCAP_KEY_ETYPE; *(__be16 *)tagging_rule->key.etype.etype.value = htons(ETH_P_1588); -- cgit From 373f121a3c3a741f90b2a81f120f37c539fa0c86 Mon Sep 17 00:00:00 2001 From: M Chetan Kumar Date: Thu, 9 Dec 2021 15:46:27 +0530 Subject: net: wwan: iosm: fixes unnecessary doorbell send In TX packet accumulation flow transport layer is giving a doorbell to device even though there is no pending control TX transfer that needs immediate attention. Introduced a new hpda_ctrl_pending variable to keep track of pending control TX transfer. If there is a pending control TX transfer which needs an immediate attention only then give a doorbell to device. Signed-off-by: M Chetan Kumar Reviewed-by: Sergey Ryazanov Signed-off-by: Jakub Kicinski --- drivers/net/wwan/iosm/iosm_ipc_imem.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c index cff3b43ca4d7..b4d47b31ba91 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem.c +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c @@ -181,9 +181,9 @@ void ipc_imem_hrtimer_stop(struct hrtimer *hr_timer) bool ipc_imem_ul_write_td(struct iosm_imem *ipc_imem) { struct ipc_mem_channel *channel; + bool hpda_ctrl_pending = false; struct sk_buff_head *ul_list; bool hpda_pending = false; - bool forced_hpdu = false; struct ipc_pipe *pipe; int i; @@ -200,15 +200,19 @@ bool ipc_imem_ul_write_td(struct iosm_imem *ipc_imem) ul_list = &channel->ul_list; /* Fill the transfer descriptor with the uplink buffer info. */ - hpda_pending |= ipc_protocol_ul_td_send(ipc_imem->ipc_protocol, + if (!ipc_imem_check_wwan_ips(channel)) { + hpda_ctrl_pending |= + ipc_protocol_ul_td_send(ipc_imem->ipc_protocol, pipe, ul_list); - - /* forced HP update needed for non data channels */ - if (hpda_pending && !ipc_imem_check_wwan_ips(channel)) - forced_hpdu = true; + } else { + hpda_pending |= + ipc_protocol_ul_td_send(ipc_imem->ipc_protocol, + pipe, ul_list); + } } - if (forced_hpdu) { + /* forced HP update needed for non data channels */ + if (hpda_ctrl_pending) { hpda_pending = false; ipc_protocol_doorbell_trigger(ipc_imem->ipc_protocol, IPC_HP_UL_WRITE_TD); -- cgit From 07d3f2743decaeefcf076457719ae01c8b43b6d2 Mon Sep 17 00:00:00 2001 From: M Chetan Kumar Date: Thu, 9 Dec 2021 15:46:28 +0530 Subject: net: wwan: iosm: fixes net interface nonfunctional after fw flash Devlink initialization flow was overwriting the IP traffic channel configuration. This was causing wwan0 network interface to be unusable after fw flash. When device boots to fully functional mode restore the IP channel configuration. Signed-off-by: M Chetan Kumar Reviewed-by: Sergey Ryazanov Signed-off-by: Jakub Kicinski --- drivers/net/wwan/iosm/iosm_ipc_imem.c | 7 ++++++- drivers/net/wwan/iosm/iosm_ipc_imem.h | 1 + drivers/net/wwan/iosm/iosm_ipc_imem_ops.c | 1 + 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c index b4d47b31ba91..e2c096863488 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem.c +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c @@ -531,6 +531,9 @@ static void ipc_imem_run_state_worker(struct work_struct *instance) return; } + if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag)) + ipc_devlink_deinit(ipc_imem->ipc_devlink); + if (!ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg)) ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem); @@ -1171,7 +1174,7 @@ void ipc_imem_cleanup(struct iosm_imem *ipc_imem) ipc_port_deinit(ipc_imem->ipc_port); } - if (ipc_imem->ipc_devlink) + if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag)) ipc_devlink_deinit(ipc_imem->ipc_devlink); ipc_imem_device_ipc_uninit(ipc_imem); @@ -1335,6 +1338,8 @@ struct iosm_imem *ipc_imem_init(struct iosm_pcie *pcie, unsigned int device_id, if (ipc_flash_link_establish(ipc_imem)) goto devlink_channel_fail; + + set_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag); } return ipc_imem; devlink_channel_fail: diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.h b/drivers/net/wwan/iosm/iosm_ipc_imem.h index 6be6708b4eec..6b479fe23a42 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem.h +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.h @@ -101,6 +101,7 @@ struct ipc_chnl_cfg; #define IOSM_CHIP_INFO_SIZE_MAX 100 #define FULLY_FUNCTIONAL 0 +#define IOSM_DEVLINK_INIT 1 /* List of the supported UL/DL pipes. */ enum ipc_mem_pipes { diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c index 825e8e5ffb2a..09261fbb79c1 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c +++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c @@ -450,6 +450,7 @@ void ipc_imem_sys_devlink_close(struct iosm_devlink *ipc_devlink) /* Release the pipe resources */ ipc_imem_pipe_cleanup(ipc_imem, &channel->ul_pipe); ipc_imem_pipe_cleanup(ipc_imem, &channel->dl_pipe); + ipc_imem->nr_of_channels--; } void ipc_imem_sys_devlink_notify_rx(struct iosm_devlink *ipc_devlink, -- cgit From 383451ceb07831d37dafdf011c09366d1c034df5 Mon Sep 17 00:00:00 2001 From: M Chetan Kumar Date: Thu, 9 Dec 2021 15:46:29 +0530 Subject: net: wwan: iosm: fixes unable to send AT command during mbim tx ev_cdev_write_pending flag is preventing a TX message post for AT port while MBIM transfer is ongoing. Removed the unnecessary check around control port TX transfer. Signed-off-by: M Chetan Kumar Reviewed-by: Sergey Ryazanov Signed-off-by: Jakub Kicinski --- drivers/net/wwan/iosm/iosm_ipc_imem.c | 1 - drivers/net/wwan/iosm/iosm_ipc_imem.h | 3 --- drivers/net/wwan/iosm/iosm_ipc_imem_ops.c | 6 ------ 3 files changed, 10 deletions(-) diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c index e2c096863488..12c03dacb5dd 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem.c +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c @@ -1270,7 +1270,6 @@ struct iosm_imem *ipc_imem_init(struct iosm_pcie *pcie, unsigned int device_id, ipc_imem->pci_device_id = device_id; - ipc_imem->ev_cdev_write_pending = false; ipc_imem->cp_version = 0; ipc_imem->device_sleep = IPC_HOST_SLEEP_ENTER_SLEEP; diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.h b/drivers/net/wwan/iosm/iosm_ipc_imem.h index 6b479fe23a42..6b8a837faef2 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem.h +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.h @@ -336,8 +336,6 @@ enum ipc_phase { * process the irq actions. * @flag: Flag to monitor the state of driver * @td_update_timer_suspended: if true then td update timer suspend - * @ev_cdev_write_pending: 0 means inform the IPC tasklet to pass - * the accumulated uplink buffers to CP. * @ev_mux_net_transmit_pending:0 means inform the IPC tasklet to pass * @reset_det_n: Reset detect flag * @pcie_wake_n: Pcie wake flag @@ -375,7 +373,6 @@ struct iosm_imem { u8 ev_irq_pending[IPC_IRQ_VECTORS]; unsigned long flag; u8 td_update_timer_suspended:1, - ev_cdev_write_pending:1, ev_mux_net_transmit_pending:1, reset_det_n:1, pcie_wake_n:1; diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c index 09261fbb79c1..831cdae28e8a 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c +++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c @@ -41,7 +41,6 @@ void ipc_imem_sys_wwan_close(struct iosm_imem *ipc_imem, int if_id, static int ipc_imem_tq_cdev_write(struct iosm_imem *ipc_imem, int arg, void *msg, size_t size) { - ipc_imem->ev_cdev_write_pending = false; ipc_imem_ul_send(ipc_imem); return 0; @@ -50,11 +49,6 @@ static int ipc_imem_tq_cdev_write(struct iosm_imem *ipc_imem, int arg, /* Through tasklet to do sio write. */ static int ipc_imem_call_cdev_write(struct iosm_imem *ipc_imem) { - if (ipc_imem->ev_cdev_write_pending) - return -1; - - ipc_imem->ev_cdev_write_pending = true; - return ipc_task_queue_send_task(ipc_imem, ipc_imem_tq_cdev_write, 0, NULL, 0, false); } -- cgit From 04ec4e6250e5f58b525b08f3dca45c7d7427620e Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 9 Dec 2021 09:26:47 +0000 Subject: net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports Martyn Welch reports that his CPU port is unable to link where it has been necessary to use one of the switch ports with an internal PHY for the CPU port. The reason behind this is the port control register is left forcing the link down, preventing traffic flow. This occurs because during initialisation, phylink expects the link to be down, and DSA forces the link down by synthesising a call to the DSA drivers phylink_mac_link_down() method, but we don't touch the forced-link state when we later reconfigure the port. Resolve this by also unforcing the link state when we are operating in PHY mode and the PPU is set to poll the PHY to retrieve link status information. Reported-by: Martyn Welch Tested-by: Martyn Welch Fixes: 3be98b2d5fbc ("net: dsa: Down cpu/dsa ports phylink will control") Cc: # 5.7: 2b29cb9e3f7f: net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's" Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1mvFhP-00F8Zb-Ul@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/dsa/mv88e6xxx/chip.c | 64 +++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 9f675464efc3..14f87f6ac479 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -698,44 +698,48 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port, { struct mv88e6xxx_chip *chip = ds->priv; struct mv88e6xxx_port *p; - int err; + int err = 0; p = &chip->ports[port]; - /* FIXME: is this the correct test? If we're in fixed mode on an - * internal port, why should we process this any different from - * PHY mode? On the other hand, the port may be automedia between - * an internal PHY and the serdes... - */ - if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port)) - return; - mv88e6xxx_reg_lock(chip); - /* In inband mode, the link may come up at any time while the link - * is not forced down. Force the link down while we reconfigure the - * interface mode. - */ - if (mode == MLO_AN_INBAND && p->interface != state->interface && - chip->info->ops->port_set_link) - chip->info->ops->port_set_link(chip, port, LINK_FORCED_DOWN); - - err = mv88e6xxx_port_config_interface(chip, port, state->interface); - if (err && err != -EOPNOTSUPP) - goto err_unlock; - err = mv88e6xxx_serdes_pcs_config(chip, port, mode, state->interface, - state->advertising); - /* FIXME: we should restart negotiation if something changed - which - * is something we get if we convert to using phylinks PCS operations. - */ - if (err > 0) - err = 0; + if (mode != MLO_AN_PHY || !mv88e6xxx_phy_is_internal(ds, port)) { + /* In inband mode, the link may come up at any time while the + * link is not forced down. Force the link down while we + * reconfigure the interface mode. + */ + if (mode == MLO_AN_INBAND && + p->interface != state->interface && + chip->info->ops->port_set_link) + chip->info->ops->port_set_link(chip, port, + LINK_FORCED_DOWN); + + err = mv88e6xxx_port_config_interface(chip, port, + state->interface); + if (err && err != -EOPNOTSUPP) + goto err_unlock; + + err = mv88e6xxx_serdes_pcs_config(chip, port, mode, + state->interface, + state->advertising); + /* FIXME: we should restart negotiation if something changed - + * which is something we get if we convert to using phylinks + * PCS operations. + */ + if (err > 0) + err = 0; + } /* Undo the forced down state above after completing configuration - * irrespective of its state on entry, which allows the link to come up. + * irrespective of its state on entry, which allows the link to come + * up in the in-band case where there is no separate SERDES. Also + * ensure that the link can come up if the PPU is in use and we are + * in PHY mode (we treat the PPU as an effective in-band mechanism.) */ - if (mode == MLO_AN_INBAND && p->interface != state->interface && - chip->info->ops->port_set_link) + if (chip->info->ops->port_set_link && + ((mode == MLO_AN_INBAND && p->interface != state->interface) || + (mode == MLO_AN_PHY && mv88e6xxx_port_ppu_updates(chip, port)))) chip->info->ops->port_set_link(chip, port, LINK_UNFORCED); p->interface = state->interface; -- cgit