summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)Author
2019-04-04sch_cake: Make sure we can write the IP header before changing DSCP bitsToke Høiland-Jørgensen
There is not actually any guarantee that the IP headers are valid before we access the DSCP bits of the packets. Fix this using the same approach taken in sch_dsmark. Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04sch_cake: Use tc_skb_protocol() helper for getting packet protocolToke Høiland-Jørgensen
We shouldn't be using skb->protocol directly as that will miss cases with hardware-accelerated VLAN tags. Use the helper instead to get the right protocol number. Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net/sched: act_sample: fix divide by zero in the traffic pathDavide Caratti
the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below: # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1 action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt Add a TDC selftest to document that 'rate' is now being validated. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Yotam Gigi <yotam.gi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01net: sched: introduce and use qdisc tree flush/purge helpersPaolo Abeni
The same code to flush qdisc tree and purge the qdisc queue is duplicated in many places and in most cases it does not respect NOLOCK qdisc: the global backlog len is used and the per CPU values are ignored. This change addresses the above, factoring-out the relevant code and using the helpers introduced by the previous patch to fetch the correct backlog len. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01net: sched: introduce and use qstats read helpersPaolo Abeni
Classful qdiscs can't access directly the child qdiscs backlog length: if such qdisc is NOLOCK, per CPU values should be accounted instead. Most qdiscs no not respect the above. As a result, qstats fetching for most classful qdisc is currently incorrect: if the child qdisc is NOLOCK, it always reports 0 len backlog. This change introduces a pair of helpers to safely fetch both backlog and qlen and use them in stats class dumping functions, fixing the above issue and cleaning a bit the code. DRR needs also to access the child qdisc queue length, so it needs custom handling. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01net/sched: fix ->get helper of the matchall clsNicolas Dichtel
It returned always NULL, thus it was never possible to get the filter. Example: $ ip link add foo type dummy $ ip link add bar type dummy $ tc qdisc add dev foo clsact $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \ matchall action mirred ingress mirror dev bar Before the patch: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall Error: Specified filter handle not found. We have an error talking to the kernel After: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2 not_in_hw action order 1: mirred (Ingress Mirror to device bar) pipe index 1 ref 1 bind 1 CC: Yotam Gigi <yotamg@mellanox.com> CC: Jiri Pirko <jiri@mellanox.com> Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-26net: sched: Kconfig: update reference link for PIELeslie Monis
RFC 8033 replaces the IETF draft for PIE Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: sched: fix cleanup NULL pointer exception in act_mirrJohn Hurley
A new mirred action is created by the tcf_mirred_init function. This contains a list head struct which is inserted into a global list on successful creation of a new action. However, after a creation, it is still possible to error out and call the tcf_idr_release function. This, in turn, calls the act_mirr cleanup function via __tcf_idr_release and __tcf_action_put. This cleanup function tries to delete the list entry which is as yet uninitialised, leading to a NULL pointer exception. Fix this by initialising the list entry on creation of a new action. Bug report: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 32 PID: 5636 Comm: handler194 Tainted: G OE 5.0.0+ #186 Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015 RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred] Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00 RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0 RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000 R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001 R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000 FS: 00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0 Call Trace: tcf_action_cleanup+0x59/0xca __tcf_action_put+0x54/0x6b __tcf_idr_release.cold.33+0x9/0x12 tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred] tcf_action_init_1+0x3d0/0x4c0 tcf_action_init+0x9c/0x130 tcf_exts_validate+0xab/0xc0 fl_change+0x1ca/0x982 [cls_flower] tc_new_tfilter+0x647/0x8d0 ? load_balance+0x14b/0x9e0 rtnetlink_rcv_msg+0xe3/0x370 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? _cond_resched+0x15/0x30 ? __kmalloc_node_track_caller+0x1d4/0x2b0 ? rtnl_calcit.isra.31+0xf0/0xf0 netlink_rcv_skb+0x49/0x110 netlink_unicast+0x16f/0x210 netlink_sendmsg+0x1df/0x390 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x27b/0x2c0 ? futex_wake+0x80/0x140 ? do_futex+0x2b9/0xac0 ? ep_scan_ready_list.constprop.22+0x1f2/0x210 ? ep_poll+0x7a/0x430 __sys_sendmsg+0x47/0x80 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: let actions use RCU to access 'goto_chain'Davide Caratti
use RCU when accessing the action chain, to avoid use after free in the traffic path when 'goto chain' is replaced on existing TC actions (see script below). Since the control action is read in the traffic path without holding the action spinlock, we need to explicitly ensure that a->goto_chain is not NULL before dereferencing (i.e it's not sufficient to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL dereferences in tcf_action_goto_chain_exec() when the following script: # tc chain add dev dd0 chain 42 ingress protocol ip flower \ > ip_proto udp action pass index 4 # tc filter add dev dd0 ingress protocol ip flower \ > ip_proto udp action csum udp goto chain 42 index 66 # tc chain del dev dd0 chain 42 ingress (start UDP traffic towards dd0) # tc action replace action csum udp pass index 66 was run repeatedly for several hours. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_vlan: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action vlan pop pass index 90 # tc actions replace action vlan \ > pop goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action vlan had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: vlan pop goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000007974f067 P4D 800000007974f067 PUD 79638067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff982dfdb83be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff982dfc55db00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff982df97099c0 RDI: ffff982dfc55db00 RBP: ffff982dfdb83c80 R08: ffff982df983fec8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff982df5aacd00 R13: ffff982df5aacd08 R14: 0000000000000001 R15: ffff982df97099c0 FS: 0000000000000000(0000) GS:ffff982dfdb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000796d0005 CR4: 00000000001606e0 Call Trace: <IRQ> tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? enqueue_hrtimer+0x39/0x90 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 Code: 7b ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffa4714038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff840184f0 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000001e57d3f387 RBP: 0000000000000003 R08: 001125d9ca39e1eb R09: 0000000000000000 R10: 000000000000027d R11: 000000000009f400 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_vlan veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon snd pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops virtio_blk ttm net_failover virtio_console failover ata_piix drm libata crc32c_intel virtio_pci serio_raw virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_vlan_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_tunnel_key: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \ > nocsum id 1 pass index 90 # tc actions replace action tunnel_key \ > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \ > goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action tunnel_key had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2.0 key_id 1 dst_port 3128 nocsum goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002 RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246 RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000 R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40 FS: 0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0 Call Trace: <IRQ> tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00 R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_tunnel_key_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_skbmod: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action skbmod set smac 00:c1:a0:c1:a0:00 pass index 90 # tc actions replace action skbmod \ > set smac 00:c1:a0:c1:a0:00 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action skbmod had the following output: src MAC address <00:c1:a0:c1:a0:00> src MAC address <00:c1:a0:c1:a0:00> Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: skbmod goto chain 42 set smac 00:c1:a0:c1:a0:00 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000002d5c7067 P4D 800000002d5c7067 PUD 77e16067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff8987ffd83be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff8987aeb68800 RCX: ffff8987fa263640 RDX: 0000000000000000 RSI: ffff8987f51c8802 RDI: 00000000000000a0 RBP: ffff8987ffd83c80 R08: ffff8987f939bac8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8987f5c77d00 R13: ffff8987f5c77d08 R14: 0000000000000001 R15: ffff8987f0c29f00 FS: 0000000000000000(0000) GS:ffff8987ffd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007832c004 CR4: 00000000001606e0 Call Trace: <IRQ> tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 Code: 56 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffa2a1c038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffffa94184f0 RBX: 0000000000000003 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003 RBP: 0000000000000003 R08: 001123cfc2ba71ac R09: 0000000000000000 R10: 0000000000000000 R11: 00000000000f4240 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_skbmod veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd cryptd glue_helper snd_pcm joydev pcspkr virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover virtio_console ttm virtio_blk failover drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_skbmod_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_skbedit: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action skbedit ptype host pass index 90 # tc actions replace action skbedit \ > ptype host goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action skbedit had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: skbedit ptype host goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 3467 Comm: kworker/3:3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffb50a81e1fad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9aa47ba4ea00 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff9aa469eeb3c0 RDI: ffff9aa47ba4ea00 RBP: ffffb50a81e1fb70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff9aa47bce0638 R12: ffff9aa4793b0c00 R13: ffff9aa4793b0c08 R14: 0000000000000001 R15: ffff9aa469eeb3c0 FS: 0000000000000000(0000) GS:ffff9aa474780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007360e005 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_skbedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev soundcore pcspkr virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net net_failover drm failover virtio_blk virtio_console ata_piix virtio_pci crc32c_intel serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_skbedit_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_simple: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action simple sdata hello pass index 90 # tc actions replace action simple \ > sdata world goto chain 42 index 90 cookie c1a0c1a0 # tc action show action simple had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: Simple <world> index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000006a6fb067 P4D 800000006a6fb067 PUD 6aed6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 3241 Comm: kworker/2:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffbe6781763ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9e59bdb80e00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9e59b4716738 RDI: ffff9e59ab12d140 RBP: ffffbe6781763b70 R08: 0000000000000234 R09: 0000000000aaaaaa R10: 0000000000000000 R11: ffff9e59b247cd50 R12: ffff9e59b112f100 R13: ffff9e59b112f108 R14: 0000000000000001 R15: ffff9e59ab12d0c0 FS: 0000000000000000(0000) GS:ffff9e59b4700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000006af92004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_simple veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm net_failover virtio_console virtio_blk failover drm crc32c_intel serio_raw floppy ata_piix libata virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_simple_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_sample: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action sample rate 1024 group 4 pass index 90 # tc actions replace action sample \ > rate 1024 group 4 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action sample had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: sample rate 1/1024 group 4 goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 8000000079966067 P4D 8000000079966067 PUD 7987b067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffbee60033fad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff99d7ae6e3b00 RCX: 00000000e555df9b RDX: 0000000000000000 RSI: 00000000b0352718 RDI: ffff99d7fda1fcf0 RBP: ffffbee60033fb70 R08: 0000000070731ab1 R09: 0000000000000400 R10: 0000000000000000 R11: ffff99d7ac733838 R12: ffff99d7f3c2be00 R13: ffff99d7f3c2be08 R14: 0000000000000001 R15: ffff99d7f3c2b600 FS: 0000000000000000(0000) GS:ffff99d7fda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000797de006 CR4: 00000000001606f0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_sample psample veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd snd_pcm cryptd glue_helper snd_timer joydev snd pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover ttm failover virtio_blk virtio_console drm ata_piix serio_raw crc32c_intel libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_sample_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_police: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action police rate 3mbit burst 250k pass index 90 # tc actions replace action police \ > rate 3mbit burst 250k goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action police rate 3mbit burst had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: police 0x5a rate 3Mbit burst 250Kb mtu 2Kb action goto chain 42 overhead 0b ref 2 bind 1 cookie c1a0c1a0 Then, when crash0 starts transmitting more than 3Mbit/s, the following kernel crash is observed: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000007a779067 P4D 800000007a779067 PUD 2ad96067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 5032 Comm: netperf Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffb0e04064fa60 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff93bb3322cce0 RCX: 0000000000000005 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff93bb3322cce0 RBP: ffffb0e04064fb00 R08: 0000000000000022 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff93bb3beed300 R13: ffff93bb3beed308 R14: 0000000000000001 R15: ffff93bb3b64d000 FS: 00007f0bc6be5740(0000) GS:ffff93bb3db80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000746a8001 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ipt_do_table+0x31c/0x420 [ip_tables] ? ip_finish_output2+0x16f/0x430 ip_finish_output2+0x16f/0x430 ? ip_output+0x69/0xe0 ip_output+0x69/0xe0 ? ip_forward_options+0x1a0/0x1a0 __tcp_transmit_skb+0x563/0xa40 tcp_write_xmit+0x243/0xfa0 __tcp_push_pending_frames+0x32/0xf0 tcp_sendmsg_locked+0x404/0xd30 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 ? __sys_connect+0x87/0xf0 ? syscall_trace_enter+0x1df/0x2e0 ? __audit_syscall_exit+0x216/0x260 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f0bc5ffbafd Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 ae c4 2c 00 85 c0 75 2d 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 63 63 2c 00 f7 d8 64 89 02 48 RSP: 002b:00007fffef94b7f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f0bc5ffbafd RDX: 0000000000004000 RSI: 00000000017e5420 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 R13: 00000000017e51d0 R14: 0000000000000010 R15: 0000000000000006 Modules linked in: act_police veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_blk virtio_net virtio_console net_failover failover crc32c_intel ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_police_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_pedit: validate the control action inside init()Davide Caratti
the following script: # tc filter add dev crash0 egress matchall \ > action pedit ex munge ip ttl set 10 pass index 90 # tc actions replace action pedit \ > ex munge ip ttl set 10 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action pedit had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: pedit action goto chain 42 keys 1 index 90 ref 2 bind 1 key #0 at ipv4+8: val 0a000000 mask 00ffffff cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff94a73db03be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff94a6ee4c0700 RCX: 000000000000000a RDX: 0000000000000000 RSI: ffff94a6ed22c800 RDI: 0000000000000000 RBP: ffff94a73db03c80 R08: ffff94a7386fa4c8 R09: ffff94a73229ea20 R10: 0000000000000000 R11: 0000000000000000 R12: ffff94a6ed22cb00 R13: ffff94a6ed22cb08 R14: 0000000000000001 R15: ffff94a6ed22c800 FS: 0000000000000000(0000) GS:ffff94a73db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007120e002 CR4: 00000000001606e0 Call Trace: <IRQ> tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 Code: 4e ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffab1740387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffffb18184f0 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002 RBP: 0000000000000002 R08: 000f168fa695f9a9 R09: 0000000000000020 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_pedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep aesni_intel snd_hda_core crypto_simd snd_seq cryptd glue_helper snd_seq_device snd_pcm joydev snd_timer pcspkr virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper virtio_net net_failover syscopyarea sysfillrect sysimgblt failover virtio_blk fb_sys_fops virtio_console ttm drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_pedit_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_nat: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action nat ingress 1.18.1.1 1.18.2.2 pass index 90 # tc actions replace action nat \ > ingress 1.18.1.1 1.18.2.2 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action nat had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: nat ingress 1.18.1.1/32 1.18.2.2 goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000002d180067 P4D 800000002d180067 PUD 7cb8b067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 164 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffae4500e2fad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9fa52e28c800 RCX: 0000000001011201 RDX: 0000000000000000 RSI: 0000000000000056 RDI: ffff9fa52ca12800 RBP: ffffae4500e2fb70 R08: 0000000000000022 R09: 000000000000000e R10: 00000000ffffffff R11: 0000000001011201 R12: ffff9fa52cbc9c00 R13: ffff9fa52cbc9c08 R14: 0000000000000001 R15: ffff9fa52ca12780 FS: 0000000000000000(0000) GS:ffff9fa57db80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000073f8c004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_nat veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk net_failover failover virtio_console drm crc32c_intel floppy ata_piix libata virtio_pci virtio_ring virtio serio_raw dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_nat_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_connmark: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action connmark pass index 90 # tc actions replace action connmark \ > goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action connmark had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: connmark zone 0 goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0 RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00 R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40 FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_connmark_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_mirred: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action mirred ingress mirror dev lo pass # tc actions replace action mirred \ > ingress mirror dev lo goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action mirred had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: mirred (Ingress Mirror to device lo) goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: Mirror/redirect action on BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 47 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffa772404b7ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9c5afc3f4300 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9c5afdba9380 RDI: 0000000000029380 RBP: ffffa772404b7b70 R08: ffff9c5af7010028 R09: ffff9c5af7010029 R10: 0000000000000000 R11: ffff9c5af94c6a38 R12: ffff9c5af7953000 R13: ffff9c5af7953008 R14: 0000000000000001 R15: ffff9c5af7953d00 FS: 0000000000000000(0000) GS:ffff9c5afdb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007c514004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_mirred veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_codec mbcache ghash_clmulni_intel jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer snd crypto_simd cryptd glue_helper soundcore virtio_balloon joydev pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm virtio_blk net_failover virtio_console failover drm ata_piix crc32c_intel virtio_pci serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_mirred_init() proved to fix the above issue. For the same reason, postpone the assignment of tcfa_action and tcfm_eaction to avoid partial reconfiguration of a mirred rule when it's replaced by another one that mirrors to a device that does not exist. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_ife: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action ife encode allow mark pass index 90 # tc actions replace action ife \ > encode allow mark goto chain 42 index 90 cookie c1a0c1a0 # tc action show action ife had the following output: IFE type 0xED3E IFE type 0xED3E Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: ife encode action goto chain 42 type 0XED3E allow mark index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000007b4e7067 P4D 800000007b4e7067 PUD 7b4e6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 164 Comm: kworker/2:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffa6a7c0553ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9796ee1bbd00 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa6a7c0553b70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff9797385bb038 R12: ffff9796ead9d700 R13: ffff9796ead9d708 R14: 0000000000000001 R15: ffff9796ead9d800 FS: 0000000000000000(0000) GS:ffff97973db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007c41e006 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_gact act_meta_mark act_ife dummy veth ip6table_filter ip6_tables iptable_filter binfmt_misc snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hwdep snd_hda_core ghash_clmulni_intel snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd snd cryptd glue_helper virtio_balloon joydev pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl virtio_net drm_kms_helper virtio_blk net_failover syscopyarea failover sysfillrect virtio_console sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw ata_piix virtio_pci virtio_ring libata virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_ife] CR2: 0000000000000000 Validating the control action within tcf_ife_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_gact: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action gact pass index 90 # tc actions replace action gact \ > goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action gact had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: gact action goto chain 42 random type none pass val 0 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff8c2434703be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff8c23ed6d7e00 RCX: 000000000000005a RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c23ed6d7e00 RBP: ffff8c2434703c80 R08: ffff8c243b639ac8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2429e68b00 R13: ffff8c2429e68b08 R14: 0000000000000001 R15: ffff8c2429c5a480 FS: 0000000000000000(0000) GS:ffff8c2434700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000002dc0e005 CR4: 00000000001606e0 Call Trace: <IRQ> tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 Code: 74 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffff9c8640387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff8b2184f0 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002 RBP: 0000000000000002 R08: 000eb57882b36cc3 R09: 0000000000000020 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_gact act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mbcache jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper virtio_balloon joydev pcspkr snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea virtio_net sysfillrect net_failover virtio_blk sysimgblt fb_sys_fops virtio_console ttm failover drm crc32c_intel serio_raw ata_piix libata floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf] CR2: 0000000000000000 Validating the control action within tcf_gact_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_csum: validate the control action inside init()Davide Caratti
the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall action csum icmp pass index 90 # tc actions replace action csum icmp goto chain 42 index 90 \ > cookie c1a0c1a0 # tc actions show action csum had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: csum (icmp) action goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 8000000074692067 P4D 8000000074692067 PUD 2e210067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff93153da03be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9314ee40f700 RCX: 0000000000003a00 RDX: 0000000000000000 RSI: ffff931537c87828 RDI: ffff931537c87818 RBP: ffff93153da03c80 R08: 00000000527cffff R09: 0000000000000003 R10: 000000000000003f R11: 0000000000000028 R12: ffff9314edf68400 R13: ffff9314edf68408 R14: 0000000000000001 R15: ffff9314ed67b600 FS: 0000000000000000(0000) GS:ffff93153da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000073e32003 CR4: 00000000001606f0 Call Trace: <IRQ> tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 Code: 66 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffffff9a803e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff99e184f0 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000000 RBP: 0000000000000000 R08: 000eb5c4572376b3 R09: 0000000000000000 R10: ffffa53e806a3ca0 R11: 00000000000f4240 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_kernel+0x49e/0x4be secondary_startup_64+0xa4/0xb0 Modules linked in: act_csum veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel mbcache snd_hda_codec jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt net_failover fb_sys_fops virtio_console virtio_blk ttm failover drm ata_piix crc32c_intel floppy virtio_pci serio_raw libata virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_csum_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: act_bpf: validate the control action inside init()Davide Caratti
the following script: # tc filter add dev crash0 egress matchall \ > action bpf bytecode '1,6 0 0 4294967295' pass index 90 # tc actions replace action bpf \ > bytecode '1,6 0 0 4294967295' goto chain 42 index 90 cookie c1a0c1a0 # tc action show action bpf had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: bpf bytecode '1,6 0 0 4294967295' default-action goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffb3a0803dfa90 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff942b347ada00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffb3a08034d038 RDI: ffff942b347ada00 RBP: ffffb3a0803dfb30 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffb3a0803dfb0c R12: ffff942b3b682b00 R13: ffff942b3b682b08 R14: 0000000000000001 R15: ffff942b3b682f00 FS: 00007f6160a72740(0000) GS:ffff942b3da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000795a4002 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip_finish_output2+0x16f/0x430 ip_finish_output2+0x16f/0x430 ? ip_output+0x69/0xe0 ip_output+0x69/0xe0 ? ip_forward_options+0x1a0/0x1a0 ip_send_skb+0x15/0x40 raw_sendmsg+0x8e1/0xbd0 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xa0 ? try_to_wake_up+0x54/0x480 ? ldsem_down_read+0x3f/0x280 ? _cond_resched+0x15/0x40 ? down_read+0xe/0x30 ? copy_termios+0x1e/0x70 ? tty_mode_ioctl+0x1b6/0x4c0 ? sock_sendmsg+0x36/0x40 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 ? do_vfs_ioctl+0xa4/0x640 ? handle_mm_fault+0xdc/0x210 ? syscall_trace_enter+0x1df/0x2e0 ? __audit_syscall_exit+0x216/0x260 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f615f7e3c03 Code: 48 8b 0d 90 62 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 9d c3 2c 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 4b cc 00 00 48 89 04 24 RSP: 002b:00007ffee5d8cc28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055a4f28f1700 RCX: 00007f615f7e3c03 RDX: 0000000000000040 RSI: 000055a4f28f1700 RDI: 0000000000000003 RBP: 00007ffee5d8e340 R08: 000055a4f28ee510 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 000055a4f28f16c0 R14: 000055a4f28ef69c R15: 0000000000000080 Modules linked in: act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache crct10dif_pclmul jbd2 crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper pcspkr joydev virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_blk virtio_net virtio_console net_failover failover syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel ata_piix serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_bpf_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: prepare TC actions to properly validate the control actionDavide Caratti
- pass a pointer to struct tcf_proto in each actions's init() handler, to allow validating the control action, checking whether the chain exists and (eventually) refcounting it. - remove code that validates the control action after a successful call to the action's init() handler, and replace it with a test that forbids addition of actions having 'goto_chain' and NULL goto_chain pointer at the same time. - add tcf_action_check_ctrlact(), that will validate the control action and eventually allocate the action 'goto_chain' within the init() handler. - add tcf_action_set_ctrlact(), that will assign the control action and swap the current 'goto_chain' pointer with the new given one. This disallows 'goto_chain' on actions that don't initialize it properly in their init() handler, i.e. calling tcf_action_check_ctrlact() after successful IDR reservation and then calling tcf_action_set_ctrlact() to assign 'goto_chain' and 'tcf_action' consistently. By doing this, the kernel does not leak anymore refcounts when a valid 'goto chain' handle is replaced in TC actions, causing kmemleak splats like the following one: # tc chain add dev dd0 chain 42 ingress protocol ip flower \ > ip_proto tcp action drop # tc chain add dev dd0 chain 43 ingress protocol ip flower \ > ip_proto udp action drop # tc filter add dev dd0 ingress matchall \ > action gact goto chain 42 index 66 # tc filter replace dev dd0 ingress matchall \ > action gact goto chain 43 index 66 # echo scan >/sys/kernel/debug/kmemleak <...> unreferenced object 0xffff93c0ee09f000 (size 1024): comm "tc", pid 2565, jiffies 4295339808 (age 65.426s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0 [<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0 [<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110 [<000000006126a348>] netlink_unicast+0x1a0/0x250 [<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0 [<00000000a25a2171>] sock_sendmsg+0x36/0x40 [<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0 [<00000000d0422042>] __sys_sendmsg+0x5e/0xa0 [<000000007a6c61f9>] do_syscall_64+0x5b/0x180 [<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0000000013eaa334>] 0xffffffffffffffff Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-15sch_cake: Interpret fwmark parameter as a bitmaskToke Høiland-Jørgensen
We initially interpreted the fwmark parameter as a flag that simply turned on the feature, using the whole skb->mark field as the index into the CAKE tin_order array. However, it is quite common for different applications to use different parts of the mask field for their own purposes, each using a different mask. Support this use of subsets of the mark by interpreting the TCA_CAKE_FWMARK parameter as a bitmask to apply to the fwmark field when reading it. The result will be right-shifted by the number of unset lower bits of the mask before looking up the tin. In the original commit message we also failed to credit Felix Resch with originally suggesting the fwmark feature back in 2017; so the Suggested-By in this commit covers the whole fwmark feature. Fixes: 0b5c7efdfc6e ("sch_cake: Permit use of connmarks as tin classifiers") Suggested-by: Felix Resch <fuller@beif.de> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-13net_sched: return correct value for *notify* functionsZhike Wang
It is confusing to directly use return value of netlink_send()/ netlink_unicast() as the return value of *notify*, as it may be not error at all. Example: in tc_del_tfilter(), after calling tfilter_del_notify(), it will goto errout if (err). However, the netlink_send()/netlink_unicast() will return positive value even for successful case. So it may not call tcf_chain_tp_remove() and so on to clean up the resource, as a result, resource is leaked. It may be easier to only check the return value of tfilter_del_nofiy(), but it is more clean to correct all related functions. Co-developed-by: Zengmo Gao <gaozengmo@jd.com> Signed-off-by: Zhike Wang <wangzhike@jd.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08net: sched: fix potential use-after-free in __tcf_chain_put()Vlad Buslov
When used with unlocked classifier that have filters attached to actions with goto chain, __tcf_chain_put() for last non action reference can race with calls to same function from action cleanup code that releases last action reference. In this case action cleanup handler could free the chain if it executes after all references to chain were released, but before all concurrent users finished using it. Modify __tcf_chain_put() to only access tcf_chain fields when holding block->lock. Remove local variables that were used to cache some tcf_chain fields and are no longer needed because their values can now be obtained directly from chain under block->lock protection. Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-06net: sched: flower: insert new filter to idr after setting its maskVlad Buslov
When adding new filter to flower classifier, fl_change() inserts it to handle_idr before initializing filter extensions and assigning it a mask. Normally this ordering doesn't matter because all flower classifier ops callbacks assume rtnl lock protection. However, when filter has an action that doesn't have its kernel module loaded, rtnl lock is released before call to request_module(). During this time the filter can be accessed bu concurrent task before its initialization is completed, which can lead to a crash. Example case of NULL pointer dereference in concurrent dump: Task 1 Task 2 tc_new_tfilter() fl_change() idr_alloc_u32(fnew) fl_set_parms() tcf_exts_validate() tcf_action_init() tcf_action_init_1() rtnl_unlock() request_module() ... rtnl_lock() tc_dump_tfilter() tcf_chain_dump() fl_walk() idr_get_next_ul() tcf_node_dump() tcf_fill_node() fl_dump() mask = &f->mask->key; <- NULL ptr rtnl_lock() Extension initialization and mask assignment don't depend on fnew->handle that is allocated by idr_alloc_u32(). Move idr allocation code after action creation and mask assignment in fl_change() to prevent concurrent access to not fully initialized filter when rtnl lock is released to load action module. Fixes: 01683a146999 ("net: sched: refactor flower walk to iterate over idr") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-05net/sched: act_tunnel_key: Fix double free dst_cachewenxu
dst_cache_destroy will be called in dst_release dst_release-->dst_destroy_rcu-->dst_destroy-->metadata_dst_free -->dst_cache_destroy It should not call dst_cache_destroy before dst_release Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-03-04net/sched: avoid unused-label warningArnd Bergmann
The label is only used from inside the #ifdef and should be hidden the same way, to avoid this warning: net/sched/act_tunnel_key.c: In function 'tunnel_key_init': net/sched/act_tunnel_key.c:389:1: error: label 'release_tun_meta' defined but not used [-Werror=unused-label] release_tun_meta: Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Simplify logic in cake_select_tin()Toke Høiland-Jørgensen
With more modes added the logic in cake_select_tin() was getting a bit hairy, and it turns out we can actually simplify it quite a bit. This also allows us to get rid of one of the two diffserv parsing functions, which has the added benefit that already-zeroed DSCP fields won't get re-written. Suggested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Permit use of connmarks as tin classifiersKevin Darbyshire-Bryant
Add flag 'FWMARK' to enable use of firewall connmarks as tin selector. The connmark (skbuff->mark) needs to be in the range 1->tin_cnt ie. for diffserv3 the mark needs to be 1->3. Background Typically CAKE uses DSCP as the basis for tin selection. DSCP values are relatively easily changed as part of the egress path, usually with iptables & the mangle table, ingress is more challenging. CAKE is often used on the WAN interface of a residential gateway where passthrough of DSCP from the ISP is either missing or set to unhelpful values thus use of ingress DSCP values for tin selection isn't helpful in that environment. An approach to solving the ingress tin selection problem is to use CAKE's understanding of tc filters. Naive tc filters could match on source/destination port numbers and force tin selection that way, but multiple filters don't scale particularly well as each filter must be traversed whether it matches or not. e.g. a simple example to map 3 firewall marks to tins: MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' ) tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1 tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2 tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3 Another option is to use eBPF cls_act with tc filters e.g. MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' ) tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o This has the disadvantages of a) needing someone to write & maintain the bpf program, b) a bpf toolchain to compile it and c) needing to hardcode the major number in the bpf program so it matches the cake instance (or forcing the cake instance to a particular major number) since the major number cannot be passed to the bpf program via tc command line. As already hinted at by the previous examples, it would be helpful to associate tins with something that survives the Internet path and ideally allows tin selection on both egress and ingress. Netfilter's conntrack permits setting an identifying mark on a connection which can also be restored to an ingress packet with tc action connmark e.g. tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1 Since tc's connmark action has restored any connmark into skb->mark, any of the previous solutions are based upon it and in one form or another copy that mark to the skb->priority field where again CAKE picks this up. This change cuts out at least one of the (less intuitive & non-scalable) middlemen and permit direct access to skb->mark. Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Make the dual modes fairerGeorge Amanakis
CAKE host fairness does not work well with TCP flows in dual-srchost and dual-dsthost setup. The reason is that ACKs generated by TCP flows are classified as sparse flows, and affect flow isolation from other hosts. Fix this by calculating host_load based only on the bulk flows a host generates. In a hash collision the host_bulk_flow_count values must be decremented on the old hosts and incremented on the new ones *if* the queue is in the bulk set. Reported-by: Pete Heist <peteheist@gmail.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02net: sched: put back q.qlen into a single locationEric Dumazet
In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'") John made the assumption that the data path had no need to read the qdisc qlen (number of packets in the qdisc). It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO children. But pfifo_fast can be used as leaf in class full qdiscs, and existing logic needs to access the child qlen in an efficient way. HTB breaks badly, since it uses cl->leaf.q->q.qlen in : htb_activate() -> WARN_ON() htb_dequeue_tree() to decide if a class can be htb_deactivated when it has no more packets. HFSC, DRR, CBQ, QFQ have similar issues, and some calls to qdisc_tree_reduce_backlog() also read q.qlen directly. Using qdisc_qlen_sum() (which iterates over all possible cpus) in the data path is a non starter. It seems we have to put back qlen in a central location, at least for stable kernels. For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock, so the existing q.qlen{++|--} are correct. For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}() because the spinlock might be not held (for example from pfifo_fast_enqueue() and pfifo_fast_dequeue()) This patch adds atomic_qlen (in the same location than qlen) and renames the following helpers, since we want to express they can be used without qdisc lock, and that qlen is no longer percpu. - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec() - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc() Later (net-next) we might revert this patch by tracking all these qlen uses and replace them by a more efficient method (not having to access a precise qlen, but an empty/non_empty status that might be less expensive to maintain/track). Another possibility is to have a legacy pfifo_fast version that would be used when used a a child qdisc, since the parent qdisc needs a spinlock anyway. But then, future lockless qdiscs would also have the same problem. Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-02-28net: sched: pie: avoid slow division in drop probability decayLeslie Monis
As per RFC 8033, it is sufficient for the drop probability decay factor to have a value of (1 - 1/64) instead of 98%. This avoids the need to do slow division. Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: netem: fix skb length BUG_ON in __skb_to_sgvecSheng Lan
It can be reproduced by following steps: 1. virtio_net NIC is configured with gso/tso on 2. configure nginx as http server with an index file bigger than 1M bytes 3. use tc netem to produce duplicate packets and delay: tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90% 4. continually curl the nginx http server to get index file on client 5. BUG_ON is seen quickly [10258690.371129] kernel BUG at net/core/skbuff.c:4028! [10258690.371748] invalid opcode: 0000 [#1] SMP PTI [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2 [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202 [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002 [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028 [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000 [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0 [10258690.372094] Call Trace: [10258690.372094] <IRQ> [10258690.372094] skb_to_sgvec+0x11/0x40 [10258690.372094] start_xmit+0x38c/0x520 [virtio_net] [10258690.372094] dev_hard_start_xmit+0x9b/0x200 [10258690.372094] sch_direct_xmit+0xff/0x260 [10258690.372094] __qdisc_run+0x15e/0x4e0 [10258690.372094] net_tx_action+0x137/0x210 [10258690.372094] __do_softirq+0xd6/0x2a9 [10258690.372094] irq_exit+0xde/0xf0 [10258690.372094] smp_apic_timer_interrupt+0x74/0x140 [10258690.372094] apic_timer_interrupt+0xf/0x20 [10258690.372094] </IRQ> In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's linear data size and nonlinear data size, thus BUG_ON triggered. Because the skb is cloned and a part of nonlinear data is split off. Duplicate packet is cloned in netem_enqueue() and may be delayed some time in qdisc. When qdisc len reached the limit and returns NET_XMIT_DROP, the skb will be retransmit later in write queue. the skb will be fragmented by tso_fragment(), the limit size that depends on cwnd and mss decrease, the skb's nonlinear data will be split off. The length of the skb cloned by netem will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(), the BUG_ON trigger. To fix it, netem returns NET_XMIT_SUCCESS to upper stack when it clones a duplicate packet. Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()") Signed-off-by: Sheng Lan <lansheng@huawei.com> Reported-by: Qin Ji <jiqin.ji@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: sched: act_csum: Fix csum calc for tagged packetsEli Britstein
The csum calculation is different for IPv4/6. For VLAN packets, tc_skb_protocol returns the VLAN protocol rather than the packet's one (e.g. IPv4/6), so csum is not calculated. Furthermore, VLAN may not be stripped so csum is not calculated in this case too. Calculate the csum for those cases. Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: sched: act_tunnel_key: fix metadata handlingVlad Buslov
Tunnel key action params->tcft_enc_metadata is only set when action is TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly dereferenced during tunnel key init and release without verifying that action is if correct type, which causes NULL pointer dereference. Metadata tunnel dst_cache is also leaked on action overwrite. Fix metadata handling: - Verify that metadata pointer is not NULL before dereferencing it in tunnel_key_init error handling code. - Move dst_cache destroy code into tunnel_key_release_params() function that is called in both action overwrite and release cases (fixes resource leak) and verifies that actions has correct type before dereferencing metadata pointer (fixes NULL pointer dereference). Oops with KASAN enabled during tdc tests execution: [ 261.080482] ================================================================== [ 261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0 [ 261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976 [ 261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157 [ 261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 261.116726] Call Trace: [ 261.119234] dump_stack+0x9a/0xeb [ 261.122625] ? dst_cache_destroy+0x21/0xa0 [ 261.126818] ? dst_cache_destroy+0x21/0xa0 [ 261.131004] kasan_report+0x176/0x192 [ 261.134752] ? idr_get_next+0xd0/0x120 [ 261.138578] ? dst_cache_destroy+0x21/0xa0 [ 261.142768] dst_cache_destroy+0x21/0xa0 [ 261.146799] tunnel_key_release+0x3a/0x50 [act_tunnel_key] [ 261.152392] tcf_action_cleanup+0x2c/0xc0 [ 261.156490] tcf_generic_walker+0x4c2/0x5c0 [ 261.160794] ? tcf_action_dump_1+0x390/0x390 [ 261.165163] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key] [ 261.170865] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key] [ 261.176641] tca_action_gd+0x600/0xa40 [ 261.180482] ? tca_get_fill.constprop.17+0x200/0x200 [ 261.185548] ? __lock_acquire+0x588/0x1d20 [ 261.189741] ? __lock_acquire+0x588/0x1d20 [ 261.193922] ? mark_held_locks+0x90/0x90 [ 261.197944] ? mark_held_locks+0x90/0x90 [ 261.202018] ? __nla_parse+0xfe/0x190 [ 261.205774] tc_ctl_action+0x218/0x230 [ 261.209614] ? tcf_action_add+0x230/0x230 [ 261.213726] rtnetlink_rcv_msg+0x3a5/0x600 [ 261.217910] ? lock_downgrade+0x2d0/0x2d0 [ 261.222006] ? validate_linkmsg+0x400/0x400 [ 261.226278] ? find_held_lock+0x6d/0xd0 [ 261.230200] ? match_held_lock+0x1b/0x210 [ 261.234296] ? validate_linkmsg+0x400/0x400 [ 261.238567] netlink_rcv_skb+0xc7/0x1f0 [ 261.242489] ? netlink_ack+0x470/0x470 [ 261.246319] ? netlink_deliver_tap+0x1f3/0x5a0 [ 261.250874] netlink_unicast+0x2ae/0x350 [ 261.254884] ? netlink_attachskb+0x340/0x340 [ 261.261647] ? _copy_from_iter_full+0xdd/0x380 [ 261.268576] ? __virt_addr_valid+0xb6/0xf0 [ 261.275227] ? __check_object_size+0x159/0x240 [ 261.282184] netlink_sendmsg+0x4d3/0x630 [ 261.288572] ? netlink_unicast+0x350/0x350 [ 261.295132] ? netlink_unicast+0x350/0x350 [ 261.301608] sock_sendmsg+0x6d/0x80 [ 261.307467] ___sys_sendmsg+0x48e/0x540 [ 261.313633] ? copy_msghdr_from_user+0x210/0x210 [ 261.320545] ? save_stack+0x89/0xb0 [ 261.326289] ? __lock_acquire+0x588/0x1d20 [ 261.332605] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 261.340063] ? mark_held_locks+0x90/0x90 [ 261.346162] ? do_filp_open+0x138/0x1d0 [ 261.352108] ? may_open_dev+0x50/0x50 [ 261.357897] ? match_held_lock+0x1b/0x210 [ 261.364016] ? __fget_light+0xa6/0xe0 [ 261.369840] ? __sys_sendmsg+0xd2/0x150 [ 261.375814] __sys_sendmsg+0xd2/0x150 [ 261.381610] ? __ia32_sys_shutdown+0x30/0x30 [ 261.388026] ? lock_downgrade+0x2d0/0x2d0 [ 261.394182] ? mark_held_locks+0x1c/0x90 [ 261.400230] ? do_syscall_64+0x1e/0x280 [ 261.406172] do_syscall_64+0x78/0x280 [ 261.411932] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 261.419103] RIP: 0033:0x7f28e91a8b87 [ 261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87 [ 261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003 [ 261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c [ 261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480 [ 261.506281] ================================================================== [ 261.516076] Disabling lock debugging due to kernel taint [ 261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [ 261.534413] #PF error: [normal kernel read fault] [ 261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0 [ 261.551294] Oops: 0000 [#1] SMP KASAN PTI [ 261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G B 5.0.0-rc7+ #157 [ 261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0 [ 261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40 [ 261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282 [ 261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071 [ 261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297 [ 261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89 [ 261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0 [ 261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001 [ 261.671110] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000 [ 261.682447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0 [ 261.701283] Call Trace: [ 261.706374] tunnel_key_release+0x3a/0x50 [act_tunnel_key] [ 261.714522] tcf_action_cleanup+0x2c/0xc0 [ 261.721208] tcf_generic_walker+0x4c2/0x5c0 [ 261.728074] ? tcf_action_dump_1+0x390/0x390 [ 261.734996] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key] [ 261.743247] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key] [ 261.751557] tca_action_gd+0x600/0xa40 [ 261.757991] ? tca_get_fill.constprop.17+0x200/0x200 [ 261.765644] ? __lock_acquire+0x588/0x1d20 [ 261.772461] ? __lock_acquire+0x588/0x1d20 [ 261.779266] ? mark_held_locks+0x90/0x90 [ 261.785880] ? mark_held_locks+0x90/0x90 [ 261.792470] ? __nla_parse+0xfe/0x190 [ 261.798738] tc_ctl_action+0x218/0x230 [ 261.805145] ? tcf_action_add+0x230/0x230 [ 261.811760] rtnetlink_rcv_msg+0x3a5/0x600 [ 261.818564] ? lock_downgrade+0x2d0/0x2d0 [ 261.825433] ? validate_linkmsg+0x400/0x400 [ 261.832256] ? find_held_lock+0x6d/0xd0 [ 261.838624] ? match_held_lock+0x1b/0x210 [ 261.845142] ? validate_linkmsg+0x400/0x400 [ 261.851729] netlink_rcv_skb+0xc7/0x1f0 [ 261.857976] ? netlink_ack+0x470/0x470 [ 261.864132] ? netlink_deliver_tap+0x1f3/0x5a0 [ 261.870969] netlink_unicast+0x2ae/0x350 [ 261.877294] ? netlink_attachskb+0x340/0x340 [ 261.883962] ? _copy_from_iter_full+0xdd/0x380 [ 261.890750] ? __virt_addr_valid+0xb6/0xf0 [ 261.897188] ? __check_object_size+0x159/0x240 [ 261.903928] netlink_sendmsg+0x4d3/0x630 [ 261.910112] ? netlink_unicast+0x350/0x350 [ 261.916410] ? netlink_unicast+0x350/0x350 [ 261.922656] sock_sendmsg+0x6d/0x80 [ 261.928257] ___sys_sendmsg+0x48e/0x540 [ 261.934183] ? copy_msghdr_from_user+0x210/0x210 [ 261.940865] ? save_stack+0x89/0xb0 [ 261.946355] ? __lock_acquire+0x588/0x1d20 [ 261.952358] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 261.959468] ? mark_held_locks+0x90/0x90 [ 261.965248] ? do_filp_open+0x138/0x1d0 [ 261.970910] ? may_open_dev+0x50/0x50 [ 261.976386] ? match_held_lock+0x1b/0x210 [ 261.982210] ? __fget_light+0xa6/0xe0 [ 261.987648] ? __sys_sendmsg+0xd2/0x150 [ 261.993263] __sys_sendmsg+0xd2/0x150 [ 261.998613] ? __ia32_sys_shutdown+0x30/0x30 [ 262.004555] ? lock_downgrade+0x2d0/0x2d0 [ 262.010236] ? mark_held_locks+0x1c/0x90 [ 262.015758] ? do_syscall_64+0x1e/0x280 [ 262.021234] do_syscall_64+0x78/0x280 [ 262.026500] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 262.033207] RIP: 0033:0x7f28e91a8b87 [ 262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87 [ 262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003 [ 262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c [ 262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480 [ 262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas [ 262.204393] CR2: 00000000000000b0 [ 262.210390] ---[ end trace 2e41d786f2c7901a ]--- [ 262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0 [ 262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40 [ 262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282 [ 262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071 [ 262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297 [ 262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89 [ 262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0 [ 262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001 [ 262.316139] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000 [ 262.327146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0 Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27Revert "net: sched: fw: don't set arg->stop in fw_walk() when empty"Vlad Buslov
This reverts commit 31a998487641 ("net: sched: fw: don't set arg->stop in fw_walk() when empty") Cls API function tcf_proto_is_empty() was changed in commit 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") to no longer depend on arg->stop to determine that classifier instance is empty. Instead, it adds dedicated arg->nonempty field, which makes the fix in fw classifier no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: sched: pie: fix 64-bit divisionLeslie Monis
Use div_u64() to resolve build failures on 32-bit platforms. Fixes: 3f7ae5f3dc52 ("net: sched: pie: add more cases to auto-tune alpha and beta") Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: Use RCU_POINTER_INITIALIZER() to init static variableLi RongQing
This pointer is RCU protected, so proper primitives should be used. Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: sched: fix typo in walker_check_empty()Vlad Buslov
Function walker_check_empty() incorrectly verifies that tp pointer is not NULL, instead of actual filter pointer. Fix conditional to check the right pointer. Adjust filter pointer naming accordingly to other cls API functions. Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: sched: pie: fix mistake in reference linkLeslie Monis
Fix the incorrect reference link to RFC 8033 Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: pie: update referencesMohit P. Tahiliani
RFC 8033 replaces the IETF draft for PIE Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: pie: add derandomization mechanismMohit P. Tahiliani
Random dropping of packets to achieve latency control may introduce outlier situations where packets are dropped too close to each other or too far from each other. This can cause the real drop percentage to temporarily deviate from the intended drop probability. In certain scenarios, such as a small number of simultaneous TCP flows, these deviations can cause significant deviations in link utilization and queuing latency. RFC 8033 suggests using a derandomization mechanism to avoid these deviations. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: pie: add more cases to auto-tune alpha and betaMohit P. Tahiliani
The current implementation scales the local alpha and beta variables in the calculate_probability function by the same amount for all values of drop probability below 1%. RFC 8033 suggests using additional cases for auto-tuning alpha and beta when the drop probability is less than 1%. In order to add more auto-tuning cases, MAX_PROB must be scaled by u64 instead of u32 to prevent underflow when scaling the local alpha and beta variables in the calculate_probability function. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: pie: change initial value of pie_vars->burst_timeMohit P. Tahiliani
RFC 8033 suggests an initial value of 150 milliseconds for the maximum time allowed for a burst of packets. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>