summaryrefslogtreecommitdiff
path: root/net/sched/act_api.c
AgeCommit message (Collapse)Author
2022-02-23net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MODWan Jiabing
Fix following coccicheck warning: ./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: sched: limit TC_ACT_REPEAT loopsEric Dumazet
We have been living dangerously, at the mercy of malicious users, abusing TC_ACT_REPEAT, as shown by this syzpot report [1]. Add an arbitrary limit (32) to the number of times an action can return TC_ACT_REPEAT. v2: switch the limit to 32 instead of 10. Use net_warn_ratelimited() instead of pr_err_once(). [1] (C repro available on demand) rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0 (t=10502 jiffies g=5305 q=190) rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=0 timer-softirq=3527 rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack:29344 pid: 14 ppid: 2 flags:0x00004000 Call Trace: <TASK> context_switch kernel/sched/core.c:4986 [inline] __schedule+0xab2/0x4db0 kernel/sched/core.c:6295 schedule+0xd2/0x260 kernel/sched/core.c:6368 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline] RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508 Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84 RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206 RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000 RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80 FS: 00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115 spin_lock_bh include/linux/spinlock.h:354 [inline] sch_tree_lock include/net/sch_generic.h:610 [inline] sch_tree_lock include/net/sch_generic.h:605 [inline] prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211 prio_init+0x5c/0x80 net/sched/sch_prio.c:244 qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7ee98aae99 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0 R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4 </TASK> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs NMI backtrace for cpu 1 CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:604 [inline] check_cpu_stall kernel/rcu/tree_stall.h:688 [inline] rcu_pending kernel/rcu/tree.c:3919 [inline] rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286 Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246 RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0 R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8 tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256 tcf_action_exec net/sched/act_api.c:1049 [inline] tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026 tcf_exts_exec include/net/pkt_cls.h:326 [inline] route4_classify+0xef0/0x1400 net/sched/cls_route.c:179 __tcf_classify net/sched/cls_api.c:1549 [inline] tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615 prio_classify net/sched/sch_prio.c:42 [inline] prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668 __dev_xmit_skb net/core/dev.c:3756 [inline] __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081 neigh_hh_output include/net/neighbour.h:533 [inline] neigh_output include/net/neighbour.h:547 [inline] ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:306 [inline] __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288 ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0x196/0x310 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126 iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82 geneve_xmit_skb drivers/net/geneve.c:966 [inline] geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one net/core/dev.c:3473 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489 __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116 neigh_hh_output include/net/neighbour.h:533 [inline] neigh_output include/net/neighbour.h:547 [inline] ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826 mld_send_cr net/ipv6/mcast.c:2127 [inline] mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> ---------------- Code disassembly (best guess): 0: 48 89 eb mov %rbp,%rbx 3: c6 45 01 01 movb $0x1,0x1(%rbp) 7: 41 bc 00 80 00 00 mov $0x8000,%r12d d: 48 c1 e9 03 shr $0x3,%rcx 11: 83 e3 07 and $0x7,%ebx 14: 41 be 01 00 00 00 mov $0x1,%r14d 1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 21: fc ff df 24: 4c 8d 2c 01 lea (%rcx,%rax,1),%r13 28: eb 0c jmp 0x36 * 2a: f3 90 pause <-- trapping instruction 2c: 41 83 ec 01 sub $0x1,%r12d 30: 0f 84 72 04 00 00 je 0x4a8 36: 41 0f b6 45 00 movzbl 0x0(%r13),%eax 3b: 38 d8 cmp %bl,%al 3d: 7f 08 jg 0x47 3f: 84 .byte 0x84 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23flow_offload: fix suspicious RCU usage when offloading tc actionBaowen Zheng
Fix suspicious rcu_dereference_protected() usage when offloading tc action. We should hold tcfa_lock to offload tc action in action initiation. Without these changes, the following warning will be observed: WARNING: suspicious RCU usage 5.16.0-rc5-net-next-01504-g7d1f236dcffa-dirty #50 Tainted: G I ----------------------------- include/net/tc_act/tc_tunnel_key.h:33 suspicious rcu_dereference_protected() usage! 1 lock held by tc/12108: CPU: 4 PID: 12108 Comm: tc Tainted: G Hardware name: Dell Inc. PowerEdge R740/07WCGN, BIOS 1.6.11 11/20/2018 Call Trace: <TASK> dump_stack_lvl+0x49/0x5e dump_stack+0x10/0x12 lockdep_rcu_suspicious+0xed/0xf8 tcf_tunnel_key_offload_act_setup+0x1de/0x300 [act_tunnel_key] tcf_action_offload_add_ex+0xc0/0x1f0 tcf_action_init+0x26a/0x2f0 tcf_action_add+0xa9/0x1f0 tc_ctl_action+0xfb/0x170 rtnetlink_rcv_msg+0x169/0x510 ? sched_clock+0x9/0x10 ? rtnl_newlink+0x70/0x70 netlink_rcv_skb+0x55/0x100 rtnetlink_rcv+0x15/0x20 netlink_unicast+0x1a8/0x270 netlink_sendmsg+0x245/0x490 sock_sendmsg+0x65/0x70 ____sys_sendmsg+0x219/0x260 ? __import_iovec+0x2c/0x150 ___sys_sendmsg+0xb7/0x100 ? __lock_acquire+0x3d5/0x1f40 ? __this_cpu_preempt_check+0x13/0x20 ? lock_is_held_type+0xe4/0x140 ? sched_clock+0x9/0x10 ? ktime_get_coarse_real_ts64+0xbe/0xd0 ? __this_cpu_preempt_check+0x13/0x20 ? lockdep_hardirqs_on+0x7e/0x100 ? ktime_get_coarse_real_ts64+0xbe/0xd0 ? trace_hardirqs_on+0x2a/0xf0 __sys_sendmsg+0x5a/0xa0 ? syscall_trace_enter.constprop.0+0x1dd/0x220 __x64_sys_sendmsg+0x1f/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4db7bb7a60 Fixes: 8cbfe939abe9 ("flow_offload: allow user to offload tc action to net device") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: validate flags of filter and actionsBaowen Zheng
Add process to validate flags of filter and actions when adding a tc filter. We need to prevent adding filter with flags conflicts with its actions. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add reoffload process to update hw_countBaowen Zheng
Add reoffload process to update hw_count when driver is inserted or removed. We will delete the action if it is with skip_sw flag and not offloaded to any hardware in reoffload process. When reoffloading actions, we still offload the actions that are added independent of filters. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19net: sched: save full flags for tc actionBaowen Zheng
Save full action flags and return user flags when return flags to user space. Save full action flags to distinguish if the action is created independent from classifier. We made this change mainly for further patch to reoffload tc actions. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add process to update action stats from hardwareBaowen Zheng
When collecting stats for actions update them using both hardware and software counters. Stats update process should not run in context of preempt_disable. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add skip_hw and skip_sw to control if offload the actionBaowen Zheng
We add skip_hw and skip_sw for user to control if offload the action to hardware. We also add in_hw_count for user to indicate if the action is offloaded to any hardware. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: allow user to offload tc action to net deviceBaowen Zheng
Use flow_indr_dev_register/flow_indr_dev_setup_offload to offload tc action. We need to call tc_cleanup_flow_action to clean up tc action entry since in tc_setup_action, some actions may hold dev refcnt, especially the mirror action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Remove Qdisc::running sequence counterAhmed S. Darwish
The Qdisc::running sequence counter has two uses: 1. Reliably reading qdisc's tc statistics while the qdisc is running (a seqcount read/retry loop at gnet_stats_add_basic()). 2. As a flag, indicating whether the qdisc in question is running (without any retry loops). For the first usage, the Qdisc::running sequence counter write section, qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what is actually needed: the raw qdisc's bstats update. A u64_stats sync point was thus introduced (in previous commits) inside the bstats structure itself. A local u64_stats write section is then started and stopped for the bstats updates. Use that u64_stats sync point mechanism for the bstats read/retry loop at gnet_stats_add_basic(). For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag, accessed with atomic bitops, is sufficient. Using a bit flag instead of a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads to the SMP barriers implicitly added through raw_read_seqcount() and write_seqcount_begin/end() getting removed. All call sites have been surveyed though, and no required ordering was identified. Now that the qdisc->running sequence counter is no longer used, remove it. Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the qdisc tc statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data typesAhmed S. Darwish
The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_stats_basic_packed) was a u64_stats sync point inside the former. The two data types are now equivalent: earlier commits added a u64_stats sync point to the latter. Combine both data types into "struct gnet_stats_basic_sync". This eliminates redundancy and simplifies the bstats read/write APIs. Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit architectures, u64_stats sync points do not use sequence counter protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Protect Qdisc::bstats with u64_statsAhmed S. Darwish
The not-per-CPU variant of qdisc tc (traffic control) statistics, Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running sequence counter. This sequence counter is used for reliably protecting bstats reads from parallel writes. Meanwhile, the seqcount's write section covers a much wider area than bstats update: qdisc_run_begin() => qdisc_run_end(). That read/write section asymmetry can lead to needless retries of the read section. To prepare for removing the Qdisc::running sequence counter altogether, introduce a u64_stats sync point inside bstats instead. Modify _bstats_update() to start/end the bstats u64_stats write section. For bisectability, and finer commits granularity, the bstats read section is still protected with a Qdisc::running read/retry loop and qdisc_run_begin/end() still starts/ends that seqcount write section. Once all call sites are modified to use _bstats_update(), the Qdisc::running seqcount will be removed and bstats read/retry loop will be modified to utilize the internal u64_stats sync point. Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid. [bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.] Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02net_sched: refactor TC action init APICong Wang
TC action ->init() API has 10 parameters, it becomes harder to read. Some of them are just boolean and can be replaced by flags. Similarly for the internal API tcf_action_init() and tcf_exts_validate(). This patch converts them to flags and fold them into the upper 16 bits of "flags", whose lower 16 bits are still reserved for user-space. More specifically, the following kernel flags are introduced: TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to distinguish whether it is compatible with policer. TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether this action is bound to a filter. TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts, means we are replacing an existing action. TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the opposite meaning, because we still hold RTNL in most cases. The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is untouched and still stored as before. I have tested this patch with tdc and I do not see any failure related to this patch. Tested-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16net/sched: Remove unnecessary if statementYajun Deng
It has been deal with the 'if (err' statement in rtnetlink_send() and rtnl_unicast(). so remove unnecessary if statement. v2: use the raw name rtnetlink_send(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17net: sched: fix error return code in tcf_del_walker()Yang Yingliang
When nla_put_u32() fails, 'ret' could be 0, it should return error code in tcf_del_walker(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08net: sched: fix err handler in tcf_action_init()Vlad Buslov
With recent changes that separated action module load from action initialization tcf_action_init() function error handling code was modified to manually release the loaded modules if loading/initialization of any further action in same batch failed. For the case when all modules successfully loaded and some of the actions were initialized before one of them failed in init handler. In this case for all previous actions the module will be released twice by the error handler: First time by the loop that manually calls module_put() for all ops, and second time by the action destroy code that puts the module after destroying the action. Reproduction: $ sudo tc actions add action simple sdata \"2\" index 2 $ sudo tc actions add action simple sdata \"1\" index 1 \ action simple sdata \"2\" index 2 RTNETLINK answers: File exists We have an error talking to the kernel $ sudo tc actions ls action simple total acts 1 action order 0: Simple <"2"> index 2 ref 1 bind 0 $ sudo tc actions flush action simple $ sudo tc actions ls action simple $ sudo tc actions add action simple sdata \"2\" index 2 Error: Failed to load TC action module. We have an error talking to the kernel $ lsmod | grep simple act_simple 20480 -1 Fix the issue by modifying module reference counting handling in action initialization code: - Get module reference in tcf_idr_create() and put it in tcf_idr_release() instead of taking over the reference held by the caller. - Modify users of tcf_action_init_1() to always release the module reference which they obtain before calling init function instead of assuming that created action takes over the reference. - Finally, modify tcf_action_init_1() to not release the module reference when overwriting existing action as this is no longer necessary since both upper and lower layers obtain and manage their own module references independently. Fixes: d349f9976868 ("net_sched: fix RTNL deadlock again caused by request_module()") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08net: sched: fix action overwrite reference countingVlad Buslov
Action init code increments reference counter when it changes an action. This is the desired behavior for cls API which needs to obtain action reference for every classifier that points to action. However, act API just needs to change the action and releases the reference before returning. This sequence breaks when the requested action doesn't exist, which causes act API init code to create new action with specified index, but action is still released before returning and is deleted (unless it was referenced concurrently by cls API). Reproduction: $ sudo tc actions ls action gact $ sudo tc actions change action gact drop index 1 $ sudo tc actions ls action gact Extend tcf_action_init() to accept 'init_res' array and initialize it with action->ops->init() result. In tcf_action_add() remove pointers to created actions from actions array before passing it to tcf_action_put_many(). Fixes: cae422f379f3 ("net: sched: use reference counting action init") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08Revert "net: sched: bump refcount for new action in ACT replace mode"Vlad Buslov
This reverts commit 6855e8213e06efcaf7c02a15e12b1ae64b9a7149. Following commit in series fixes the issue without introducing regression in error rollback of tcf_action_destroy(). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30net: sched: bump refcount for new action in ACT replace modeKumar Kartikeya Dwivedi
Currently, action creation using ACT API in replace mode is buggy. When invoking for non-existent action index 42, tc action replace action bpf obj foo.o sec <xyz> index 42 kernel creates the action, fills up the netlink response, and then just deletes the action after notifying userspace. tc action show action bpf doesn't list the action. This happens due to the following sequence when ovr = 1 (replace mode) is enabled: tcf_idr_check_alloc is used to atomically check and either obtain reference for existing action at index, or reserve the index slot using a dummy entry (ERR_PTR(-EBUSY)). This is necessary as pointers to these actions will be held after dropping the idrinfo lock, so bumping the reference count is necessary as we need to insert the actions, and notify userspace by dumping their attributes. Finally, we drop the reference we took using the tcf_action_put_many call in tcf_action_add. However, for the case where a new action is created due to free index, its refcount remains one. This when paired with the put_many call leads to the kernel setting up the action, notifying userspace of its creation, and then tearing it down. For existing actions, the refcount is still held so they remain unaffected. Fortunately due to rtnl_lock serialization requirement, such an action with refcount == 1 will not be concurrently deleted by anything else, at best CLS API can move its refcount up and down by binding to it after it has been published from tcf_idr_insert_many. Since refcount is atleast one until put_many call, CLS API cannot delete it. Also __tcf_action_put release path already ensures deterministic outcome (either new action will be created or existing action will be reused in case CLS API tries to bind to action concurrently) due to idr lock serialization. We fix this by making refcount of newly created actions as 2 in ACT API replace mode. A relaxed store will suffice as visibility is ensured only after the tcf_idr_insert_many call. Note that in case of creation or overwriting using CLS API only (i.e. bind = 1), overwriting existing action object is not allowed, and any such request is silently ignored (without error). The refcount bump that occurs in tcf_idr_check_alloc call there for existing action will pair with tcf_exts_destroy call made from the owner module for the same action. In case of action creation, there is no existing action, so no tcf_exts_destroy callback happens. This means no code changes for CLS API. Fixes: cae422f379f3 ("net: sched: use reference counting action init") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2021-02-16net: sched: fix police ext initializationVlad Buslov
When police action is created by cls API tcf_exts_validate() first conditional that calls tcf_action_init_1() directly, the action idr is not updated according to latest changes in action API that require caller to commit newly created action to idr with tcf_idr_insert_many(). This results such action not being accessible through act API and causes crash reported by syzbot: ================================================================== BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline] BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204 CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:400 [inline] kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 ================================================================== Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G B 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 panic+0x306/0x73d kernel/panic.c:231 end_report+0x58/0x5e mm/kasan/report.c:100 __kasan_report mm/kasan/report.c:403 [inline] kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Kernel Offset: disabled Fix the issue by calling tcf_idr_insert_many() after successful action initialization. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-01-18net_sched: fix RTNL deadlock again caused by request_module()Cong Wang
tcf_action_init_1() loads tc action modules automatically with request_module() after parsing the tc action names, and it drops RTNL lock and re-holds it before and after request_module(). This causes a lot of troubles, as discovered by syzbot, because we can be in the middle of batch initializations when we create an array of tc actions. One of the problem is deadlock: CPU 0 CPU 1 rtnl_lock(); for (...) { tcf_action_init_1(); -> rtnl_unlock(); -> request_module(); rtnl_lock(); for (...) { tcf_action_init_1(); -> tcf_idr_check_alloc(); // Insert one action into idr, // but it is not committed until // tcf_idr_insert_many(), then drop // the RTNL lock in the _next_ // iteration -> rtnl_unlock(); -> rtnl_lock(); -> a_o->init(); -> tcf_idr_check_alloc(); // Now waiting for the same index // to be committed -> request_module(); -> rtnl_lock() // Now waiting for RTNL lock } rtnl_unlock(); } rtnl_unlock(); This is not easy to solve, we can move the request_module() before this loop and pre-load all the modules we need for this netlink message and then do the rest initializations. So the loop breaks down to two now: for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { struct tc_action_ops *a_o; a_o = tc_action_load_ops(name, tb[i]...); ops[i - 1] = a_o; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(ops[i - 1]...); } Although this looks serious, it only has been reported by syzbot, so it seems hard to trigger this by humans. And given the size of this patch, I'd suggest to make it to net-next and not to backport to stable. This patch has been tested by syzbot and tested with tdc.py by me. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27net/sched: sch_frag: add generic packet fragment support.wenxu
Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-25net: sched: alias action flags with TCA_ACT_ prefixVlad Buslov
Currently both filter and action flags use same "TCA_" prefix which makes them hard to distinguish to code and confusing for users. Create aliases for existing action flags constants with "TCA_ACT_" prefix. Signed-off-by: Vlad Buslov <vlad@buslov.dev> Link: https://lore.kernel.org/r/20201124164054.893168-1-vlad@buslov.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16treewide: rename nla_strlcpy to nla_strscpy.Francis Laniel
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new name of this function. Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16Modify return value of nla_strlcpy to match that of strscpy.Francis Laniel
nla_strlcpy now returns -E2BIG if src was truncated when written to dst. It also returns this error value if dstsize is 0 or higher than INT_MAX. For example, if src is "foo\0" and dst is 3 bytes long, the result will be: 1. "foG" after memcpy (G means garbage). 2. "fo\0" after memset. 3. -E2BIG is returned because src was not completely written into dst. The callers of nla_strlcpy were modified to take into account this modification. Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-10net: sched: fix misspellings using misspell-fixer toolMenglong Dong
Some typos are found out by misspell-fixer tool: $ misspell-fixer -rnv ./net/sched/ ./net/sched/act_api.c:686 ./net/sched/act_bpf.c:68 ./net/sched/cls_rsvp.h:241 ./net/sched/em_cmp.c:44 ./net/sched/sch_pie.c:408 Fix typos found by misspell-fixer. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/5fa8e9d4.1c69fb81.5d889.5c64@mx.google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05net: sched: implement action-specific terse dumpVlad Buslov
Allow user to request action terse dump with new flag value TCA_FLAG_TERSE_DUMP. Only output essential action info in terse dump (kind, stats, index and cookie, if set by the user when creating the action). This is different from filter terse dump where index is excluded (filter can be identified by its own handle). Move tcf_action_dump_terse() function to the beginning of source file in order to call it from tcf_dump_walker(). Signed-off-by: Vlad Buslov <vlad@buslov.dev> Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20201102201243.287486-1-vlad@buslov.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net_sched: check error pointer in tcf_dump_walker()Cong Wang
Although we take RTNL on dump path, it is possible to skip RTNL on insertion path. So the following race condition is possible: rtnl_lock() // no rtnl lock mutex_lock(&idrinfo->lock); // insert ERR_PTR(-EBUSY) mutex_unlock(&idrinfo->lock); tc_dump_action() rtnl_unlock() So we have to skip those temporary -EBUSY entries on dump path too. Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net_sched: remove a redundant goto chain checkCong Wang
All TC actions call tcf_action_check_ctrlact() to validate goto chain, so this check in tcf_action_init_1() is actually redundant. Remove it to save troubles of leaking memory. Fixes: e49d8c22f126 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()") Reported-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net_sched: commit action insertions togetherCong Wang
syzbot is able to trigger a failure case inside the loop in tcf_action_init(), and when this happens we clean up with tcf_action_destroy(). But, as these actions are already inserted into the global IDR, other parallel process could free them before tcf_action_destroy(), then we will trigger a use-after-free. Fix this by deferring the insertions even later, after the loop, and committing all the insertions in a separate loop, so we will never fail in the middle of the insertions any more. One side effect is that the window between alloction and final insertion becomes larger, now it is more likely that the loop in tcf_del_walker() sees the placeholder -EBUSY pointer. So we have to check for error pointer in tcf_del_walker(). Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net_sched: defer tcf_idr_insert() in tcf_action_init_1()Cong Wang
All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1(). And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion. This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch. Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-08net: sched: skip an unnecessay checkTom Rix
Reviewing the error handling in tcf_action_init_1() most of the early handling uses err_out: if (cookie) { kfree(cookie->data); kfree(cookie); } before cookie could ever be set. So skip the unnecessay check. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-20Remove redundant skb null checkGaurav Singh
Remove the redundant null check for skb. Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-19net: qos offload add flow status with dropped countPo Liu
This patch adds a drop frames counter to tc flower offloading. Reporting h/w dropped frames is necessary for some actions. Some actions like police action and the coming introduced stream gate action would produce dropped frames which is necessary for user. Status update shows how many filtered packets increasing and how many dropped in those packets. v2: Changes - Update commit comments suggest by Jiri Pirko. Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15net: sched: implement terse dump support in actVlad Buslov
Extend tcf_action_dump() with boolean argument 'terse' that is used to request terse-mode action dump. In terse mode only essential data needed to identify particular action (action kind, cookie, etc.) and its stats is put to resulting skb and everything else is omitted. Implement tcf_exts_terse_dump() helper in cls API that is intended to be used to request terse dump of all exts (actions) attached to the filter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30netlink: remove type-unsafe validation_data pointerJohannes Berg
In the netlink policy, we currently have a void *validation_data that's pointing to different things: * a u32 value for bitfield32, * the netlink policy for nested/nested array * the string for NLA_REJECT Remove the pointer and place appropriate type-safe items in the union instead. While at it, completely dissolve the pointer for the bitfield32 case and just put the value there directly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: sched: expose HW stats types per action used by driversJiri Pirko
It may be up to the driver (in case ANY HW stats is passed) to select which type of HW stats he is going to use. Add an infrastructure to expose this information to user. $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop $ tc -s filter show dev enp3s0np1 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.1 in_hw in_hw_count 2 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 10 sec used 10 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: introduce nla_put_bitfield32() helper and use itJiri Pirko
Introduce a helper to pass value and selector to. The helper packs them into struct and puts them into netlink message. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23net: sched: rename more stats_typesJakub Kicinski
Commit 53eca1f3479f ("net: rename flow_action_hw_stats_types* -> flow_action_hw_stats*") renamed just the flow action types and helpers. For consistency rename variables, enums, struct members and UAPI too (note that this UAPI was not in any official release, yet). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08sched: act: allow user to specify type of HW stats for a filterJiri Pirko
Currently, user who is adding an action expects HW to report stats, however it does not have exact expectations about the stats types. That is aligned with TCA_ACT_HW_STATS_TYPE_ANY. Allow user to specify the type of HW stats for an action and require it. Pass the information down to flow_offload layer. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26sched: act: count in the size of action flags bitfieldJiri Pirko
The put of the flags was added by the commit referenced in fixes tag, however the size of the message was not extended accordingly. Fix this by adding size of the flags bitfield to the message size. Fixes: e38226786022 ("net: sched: update action implementations to support flags") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Linux-kernel memory consistency model updates. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer() net/sched: Replace rcu_swap_protected() with rcu_replace_pointer() net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer() net/core: Replace rcu_swap_protected() with rcu_replace_pointer() bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer() fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer() drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer() drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer() x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer() rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() rcu: Suppress levelspread uninitialized messages rcu: Fix uninitialized variable in nocb_gp_wait() rcu: Update descriptions for rcu_future_grace_period tracepoint rcu: Update descriptions for rcu_nocb_wake tracepoint rcu: Remove obsolete descriptions for rcu_barrier tracepoint rcu: Ensure that ->rcu_urgent_qs is set before resched IPI workqueue: Convert for_each_wq to use built-in list check rcu: Several rcu_segcblist functions can be static rcu: Remove unused function hlist_bl_del_init_rcu() Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch() ...
2019-11-12net/sched: actions: remove unused 'order'Davide Caratti
after commit 4097e9d250fb ("net: sched: don't use tc_action->order during action dump"), 'act->order' is initialized but then it's no more read, so we can just remove this member of struct tc_action. CC: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net_sched: add TCA_STATS_PKT64 attributeEric Dumazet
Now the kernel uses 64bit packet counters in scheduler layer, we want to export these counters to user space. Instead risking breaking user space by adding fields to struct gnet_stats_basic, add a new TCA_STATS_PKT64. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-31Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Linux-kernel memory consistency model updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-30net: sched: update action implementations to support flagsVlad Buslov
Extend struct tc_action with new "tcfa_flags" field. Set the field in tcf_idr_create() function and provide new helper tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags value. Update individual hardware-offloaded actions init() to pass their "flags" argument to new helper in order to skip percpu stats allocation when user requested it through flags. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30net: sched: extend TCA_ACT space with TCA_ACT_FLAGSVlad Buslov
Extend TCA_ACT space with nla_bitfield32 flags. Add TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in tcf_action_init_1() and pass resulting value as additional argument to a_o->init(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30net: sched: modify stats helper functions to support regular statsVlad Buslov
Modify stats update helper functions introduced in previous patches in this series to fallback to regular tc_action->tcfa_{b|q}stats if cpu stats are not allocated for the action argument. If regular non-percpu allocated counters are in use, then obtain action tcfa_lock while modifying them. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>