summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-08-30sch_netem: avoid null pointer deref on init failureNikolay Aleksandrov
netem can fail in ->init due to missing options (either not supplied by user-space or used as a default qdisc) causing a timer->base null pointer deref in its ->destroy() and ->reset() callbacks. Reproduce: $ sysctl net.core.default_qdisc=netem $ ip l set ethX up Crash log: [ 1814.846943] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1814.847181] IP: hrtimer_active+0x17/0x8a [ 1814.847270] PGD 59c34067 [ 1814.847271] P4D 59c34067 [ 1814.847337] PUD 37374067 [ 1814.847403] PMD 0 [ 1814.847468] [ 1814.847582] Oops: 0000 [#1] SMP [ 1814.847655] Modules linked in: sch_netem(O) sch_fq_codel(O) [ 1814.847761] CPU: 3 PID: 1573 Comm: ip Tainted: G O 4.13.0-rc6+ #62 [ 1814.847884] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1814.848043] task: ffff88003723a700 task.stack: ffff88005adc8000 [ 1814.848235] RIP: 0010:hrtimer_active+0x17/0x8a [ 1814.848407] RSP: 0018:ffff88005adcb590 EFLAGS: 00010246 [ 1814.848590] RAX: 0000000000000000 RBX: ffff880058e359d8 RCX: 0000000000000000 [ 1814.848793] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880058e359d8 [ 1814.848998] RBP: ffff88005adcb5b0 R08: 00000000014080c0 R09: 00000000ffffffff [ 1814.849204] R10: ffff88005adcb660 R11: 0000000000000020 R12: 0000000000000000 [ 1814.849410] R13: ffff880058e359d8 R14: 00000000ffffffff R15: 0000000000000001 [ 1814.849616] FS: 00007f733bbca740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 1814.849919] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1814.850107] CR2: 0000000000000000 CR3: 0000000059f0d000 CR4: 00000000000406e0 [ 1814.850313] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1814.850518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1814.850723] Call Trace: [ 1814.850875] hrtimer_try_to_cancel+0x1a/0x93 [ 1814.851047] hrtimer_cancel+0x15/0x20 [ 1814.851211] qdisc_watchdog_cancel+0x12/0x14 [ 1814.851383] netem_reset+0xe6/0xed [sch_netem] [ 1814.851561] qdisc_destroy+0x8b/0xe5 [ 1814.851723] qdisc_create_dflt+0x86/0x94 [ 1814.851890] ? dev_activate+0x129/0x129 [ 1814.852057] attach_one_default_qdisc+0x36/0x63 [ 1814.852232] netdev_for_each_tx_queue+0x3d/0x48 [ 1814.852406] dev_activate+0x4b/0x129 [ 1814.852569] __dev_open+0xe7/0x104 [ 1814.852730] __dev_change_flags+0xc6/0x15c [ 1814.852899] dev_change_flags+0x25/0x59 [ 1814.853064] do_setlink+0x30c/0xb3f [ 1814.853228] ? check_chain_key+0xb0/0xfd [ 1814.853396] ? check_chain_key+0xb0/0xfd [ 1814.853565] rtnl_newlink+0x3a4/0x729 [ 1814.853728] ? rtnl_newlink+0x117/0x729 [ 1814.853905] ? ns_capable_common+0xd/0xb1 [ 1814.854072] ? ns_capable+0x13/0x15 [ 1814.854234] rtnetlink_rcv_msg+0x188/0x197 [ 1814.854404] ? rcu_read_unlock+0x3e/0x5f [ 1814.854572] ? rtnl_newlink+0x729/0x729 [ 1814.854737] netlink_rcv_skb+0x6c/0xce [ 1814.854902] rtnetlink_rcv+0x23/0x2a [ 1814.855064] netlink_unicast+0x103/0x181 [ 1814.855230] netlink_sendmsg+0x326/0x337 [ 1814.855398] sock_sendmsg_nosec+0x14/0x3f [ 1814.855584] sock_sendmsg+0x29/0x2e [ 1814.855747] ___sys_sendmsg+0x209/0x28b [ 1814.855912] ? do_raw_spin_unlock+0xcd/0xf8 [ 1814.856082] ? _raw_spin_unlock+0x27/0x31 [ 1814.856251] ? __handle_mm_fault+0x651/0xdb1 [ 1814.856421] ? check_chain_key+0xb0/0xfd [ 1814.856592] __sys_sendmsg+0x45/0x63 [ 1814.856755] ? __sys_sendmsg+0x45/0x63 [ 1814.856923] SyS_sendmsg+0x19/0x1b [ 1814.857083] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 1814.857256] RIP: 0033:0x7f733b2dd690 [ 1814.857419] RSP: 002b:00007ffe1d3387d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1814.858238] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f733b2dd690 [ 1814.858445] RDX: 0000000000000000 RSI: 00007ffe1d338820 RDI: 0000000000000003 [ 1814.858651] RBP: ffff88005adcbf98 R08: 0000000000000001 R09: 0000000000000003 [ 1814.858856] R10: 00007ffe1d3385a0 R11: 0000000000000246 R12: 0000000000000002 [ 1814.859060] R13: 000000000066f1a0 R14: 00007ffe1d3408d0 R15: 0000000000000000 [ 1814.859267] ? trace_hardirqs_off_caller+0xa7/0xcf [ 1814.859446] Code: 10 55 48 89 c7 48 89 e5 e8 45 a1 fb ff 31 c0 5d c3 31 c0 c3 66 66 66 66 90 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 8b 45 30 <4c> 8b 20 41 8b 5c 24 38 31 c9 31 d2 48 c7 c7 50 8e 1d 82 41 89 [ 1814.860022] RIP: hrtimer_active+0x17/0x8a RSP: ffff88005adcb590 [ 1814.860214] CR2: 0000000000000000 Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30sch_fq_codel: avoid double free on init failureNikolay Aleksandrov
It is very unlikely to happen but the backlogs memory allocation could fail and will free q->flows, but then ->destroy() will free q->flows too. For correctness remove the first free and let ->destroy clean up. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30sch_cbq: fix null pointer dereferences on init failureNikolay Aleksandrov
CBQ can fail on ->init by wrong nl attributes or simply for missing any, f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL when it is activated. The first thing init does is parse opt but it will dereference a null pointer if used as a default qdisc, also since init failure at default qdisc invokes ->reset() which cancels all timers then we'll also dereference two more null pointers (timer->base) as they were never initialized. To reproduce: $ sysctl net.core.default_qdisc=cbq $ ip l set ethX up Crash log of the first null ptr deref: [44727.907454] BUG: unable to handle kernel NULL pointer dereference at (null) [44727.907600] IP: cbq_init+0x27/0x205 [44727.907676] PGD 59ff4067 [44727.907677] P4D 59ff4067 [44727.907742] PUD 59c70067 [44727.907807] PMD 0 [44727.907873] [44727.907982] Oops: 0000 [#1] SMP [44727.908054] Modules linked in: [44727.908126] CPU: 1 PID: 21312 Comm: ip Not tainted 4.13.0-rc6+ #60 [44727.908235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [44727.908477] task: ffff88005ad42700 task.stack: ffff880037214000 [44727.908672] RIP: 0010:cbq_init+0x27/0x205 [44727.908838] RSP: 0018:ffff8800372175f0 EFLAGS: 00010286 [44727.909018] RAX: ffffffff816c3852 RBX: ffff880058c53800 RCX: 0000000000000000 [44727.909222] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff8800372175f8 [44727.909427] RBP: ffff880037217650 R08: ffffffff81b0f380 R09: 0000000000000000 [44727.909631] R10: ffff880037217660 R11: 0000000000000020 R12: ffffffff822a44c0 [44727.909835] R13: ffff880058b92000 R14: 00000000ffffffff R15: 0000000000000001 [44727.910040] FS: 00007ff8bc583740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000 [44727.910339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44727.910525] CR2: 0000000000000000 CR3: 00000000371e5000 CR4: 00000000000406e0 [44727.910731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44727.910936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [44727.911141] Call Trace: [44727.911291] ? lockdep_init_map+0xb6/0x1ba [44727.911461] ? qdisc_alloc+0x14e/0x187 [44727.911626] qdisc_create_dflt+0x7a/0x94 [44727.911794] ? dev_activate+0x129/0x129 [44727.911959] attach_one_default_qdisc+0x36/0x63 [44727.912132] netdev_for_each_tx_queue+0x3d/0x48 [44727.912305] dev_activate+0x4b/0x129 [44727.912468] __dev_open+0xe7/0x104 [44727.912631] __dev_change_flags+0xc6/0x15c [44727.912799] dev_change_flags+0x25/0x59 [44727.912966] do_setlink+0x30c/0xb3f [44727.913129] ? check_chain_key+0xb0/0xfd [44727.913294] ? check_chain_key+0xb0/0xfd [44727.913463] rtnl_newlink+0x3a4/0x729 [44727.913626] ? rtnl_newlink+0x117/0x729 [44727.913801] ? ns_capable_common+0xd/0xb1 [44727.913968] ? ns_capable+0x13/0x15 [44727.914131] rtnetlink_rcv_msg+0x188/0x197 [44727.914300] ? rcu_read_unlock+0x3e/0x5f [44727.914465] ? rtnl_newlink+0x729/0x729 [44727.914630] netlink_rcv_skb+0x6c/0xce [44727.914796] rtnetlink_rcv+0x23/0x2a [44727.914956] netlink_unicast+0x103/0x181 [44727.915122] netlink_sendmsg+0x326/0x337 [44727.915291] sock_sendmsg_nosec+0x14/0x3f [44727.915459] sock_sendmsg+0x29/0x2e [44727.915619] ___sys_sendmsg+0x209/0x28b [44727.915784] ? do_raw_spin_unlock+0xcd/0xf8 [44727.915954] ? _raw_spin_unlock+0x27/0x31 [44727.916121] ? __handle_mm_fault+0x651/0xdb1 [44727.916290] ? check_chain_key+0xb0/0xfd [44727.916461] __sys_sendmsg+0x45/0x63 [44727.916626] ? __sys_sendmsg+0x45/0x63 [44727.916792] SyS_sendmsg+0x19/0x1b [44727.916950] entry_SYSCALL_64_fastpath+0x23/0xc2 [44727.917125] RIP: 0033:0x7ff8bbc96690 [44727.917286] RSP: 002b:00007ffc360991e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [44727.917579] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007ff8bbc96690 [44727.917783] RDX: 0000000000000000 RSI: 00007ffc36099230 RDI: 0000000000000003 [44727.917987] RBP: ffff880037217f98 R08: 0000000000000001 R09: 0000000000000003 [44727.918190] R10: 00007ffc36098fb0 R11: 0000000000000246 R12: 0000000000000006 [44727.918393] R13: 000000000066f1a0 R14: 00007ffc360a12e0 R15: 0000000000000000 [44727.918597] ? trace_hardirqs_off_caller+0xa7/0xcf [44727.918774] Code: 41 5f 5d c3 66 66 66 66 90 55 48 8d 56 04 45 31 c9 49 c7 c0 80 f3 b0 81 48 89 e5 41 55 41 54 53 48 89 fb 48 8d 7d a8 48 83 ec 48 <0f> b7 0e be 07 00 00 00 83 e9 04 e8 e6 f7 d8 ff 85 c0 0f 88 bb [44727.919332] RIP: cbq_init+0x27/0x205 RSP: ffff8800372175f0 [44727.919516] CR2: 0000000000000000 Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30sch_hfsc: fix null pointer deref and double free on init failureNikolay Aleksandrov
Depending on where ->init fails we can get a null pointer deref due to uninitialized hires timer (watchdog) or a double free of the qdisc hash because it is already freed by ->destroy(). Fixes: 8d5537387505 ("net/sched/hfsc: allocate tcf block for hfsc root class") Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30sch_hhf: fix null pointer dereference on init failureNikolay Aleksandrov
If sch_hhf fails in its ->init() function (either due to wrong user-space arguments as below or memory alloc failure of hh_flows) it will do a null pointer deref of q->hh_flows in its ->destroy() function. To reproduce the crash: $ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000 Crash log: [ 690.654882] BUG: unable to handle kernel NULL pointer dereference at (null) [ 690.655565] IP: hhf_destroy+0x48/0xbc [ 690.655944] PGD 37345067 [ 690.655948] P4D 37345067 [ 690.656252] PUD 58402067 [ 690.656554] PMD 0 [ 690.656857] [ 690.657362] Oops: 0000 [#1] SMP [ 690.657696] Modules linked in: [ 690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57 [ 690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 690.659255] task: ffff880058578000 task.stack: ffff88005acbc000 [ 690.659747] RIP: 0010:hhf_destroy+0x48/0xbc [ 690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246 [ 690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000 [ 690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0 [ 690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000 [ 690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea [ 690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000 [ 690.663769] FS: 00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 690.667069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0 [ 690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 690.671003] Call Trace: [ 690.671743] qdisc_create+0x377/0x3fd [ 690.672534] tc_modify_qdisc+0x4d2/0x4fd [ 690.673324] rtnetlink_rcv_msg+0x188/0x197 [ 690.674204] ? rcu_read_unlock+0x3e/0x5f [ 690.675091] ? rtnl_newlink+0x729/0x729 [ 690.675877] netlink_rcv_skb+0x6c/0xce [ 690.676648] rtnetlink_rcv+0x23/0x2a [ 690.677405] netlink_unicast+0x103/0x181 [ 690.678179] netlink_sendmsg+0x326/0x337 [ 690.678958] sock_sendmsg_nosec+0x14/0x3f [ 690.679743] sock_sendmsg+0x29/0x2e [ 690.680506] ___sys_sendmsg+0x209/0x28b [ 690.681283] ? __handle_mm_fault+0xc7d/0xdb1 [ 690.681915] ? check_chain_key+0xb0/0xfd [ 690.682449] __sys_sendmsg+0x45/0x63 [ 690.682954] ? __sys_sendmsg+0x45/0x63 [ 690.683471] SyS_sendmsg+0x19/0x1b [ 690.683974] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 690.684516] RIP: 0033:0x7f8ae529d690 [ 690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690 [ 690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003 [ 690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000 [ 690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002 [ 690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000 [ 690.688475] ? trace_hardirqs_off_caller+0xa7/0xcf [ 690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83 c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02 00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1 [ 690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0 [ 690.690636] CR2: 0000000000000000 Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30sch_multiq: fix double free on init failureNikolay Aleksandrov
The below commit added a call to ->destroy() on init failure, but multiq still frees ->queues on error in init, but ->queues is also freed by ->destroy() thus we get double free and corrupted memory. Very easy to reproduce (eth0 not multiqueue): $ tc qdisc add dev eth0 root multiq RTNETLINK answers: Operation not supported $ ip l add dumdum type dummy (crash) Trace log: [ 3929.467747] general protection fault: 0000 [#1] SMP [ 3929.468083] Modules linked in: [ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56 [ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000 [ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be [ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246 [ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df [ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020 [ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000 [ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564 [ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00 [ 3929.471869] FS: 00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 3929.472286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0 [ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3929.474873] Call Trace: [ 3929.475337] ? kstrdup_const+0x23/0x25 [ 3929.475863] kstrdup+0x2e/0x4b [ 3929.476338] kstrdup_const+0x23/0x25 [ 3929.478084] __kernfs_new_node+0x28/0xbc [ 3929.478478] kernfs_new_node+0x35/0x55 [ 3929.478929] kernfs_create_link+0x23/0x76 [ 3929.479478] sysfs_do_create_link_sd.isra.2+0x85/0xd7 [ 3929.480096] sysfs_create_link+0x33/0x35 [ 3929.480649] device_add+0x200/0x589 [ 3929.481184] netdev_register_kobject+0x7c/0x12f [ 3929.481711] register_netdevice+0x373/0x471 [ 3929.482174] rtnl_newlink+0x614/0x729 [ 3929.482610] ? rtnl_newlink+0x17f/0x729 [ 3929.483080] rtnetlink_rcv_msg+0x188/0x197 [ 3929.483533] ? rcu_read_unlock+0x3e/0x5f [ 3929.483984] ? rtnl_newlink+0x729/0x729 [ 3929.484420] netlink_rcv_skb+0x6c/0xce [ 3929.484858] rtnetlink_rcv+0x23/0x2a [ 3929.485291] netlink_unicast+0x103/0x181 [ 3929.485735] netlink_sendmsg+0x326/0x337 [ 3929.486181] sock_sendmsg_nosec+0x14/0x3f [ 3929.486614] sock_sendmsg+0x29/0x2e [ 3929.486973] ___sys_sendmsg+0x209/0x28b [ 3929.487340] ? do_raw_spin_unlock+0xcd/0xf8 [ 3929.487719] ? _raw_spin_unlock+0x27/0x31 [ 3929.488092] ? __handle_mm_fault+0x651/0xdb1 [ 3929.488471] ? check_chain_key+0xb0/0xfd [ 3929.488847] __sys_sendmsg+0x45/0x63 [ 3929.489206] ? __sys_sendmsg+0x45/0x63 [ 3929.489576] SyS_sendmsg+0x19/0x1b [ 3929.489901] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 3929.490172] RIP: 0033:0x7f0b6fb93690 [ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690 [ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003 [ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000 [ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002 [ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000 [ 3929.492352] ? trace_hardirqs_off_caller+0xa7/0xcf [ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44 89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d 8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01 [ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0 Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: f07d1501292b ("multiq: Further multiqueue cleanup") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30sch_htb: fix crash on init failureNikolay Aleksandrov
The commit below added a call to the ->destroy() callback for all qdiscs which failed in their ->init(), but some were not prepared for such change and can't handle partially initialized qdisc. HTB is one of them and if any error occurs before the qdisc watchdog timer and qdisc work are initialized then we can hit either a null ptr deref (timer->base) when canceling in ->destroy or lockdep error info about trying to register a non-static key and a stack dump. So to fix these two move the watchdog timer and workqueue init before anything that can err out. To reproduce userspace needs to send broken htb qdisc create request, tested with a modified tc (q_htb.c). Trace log: [ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2710.897977] IP: hrtimer_active+0x17/0x8a [ 2710.898174] PGD 58fab067 [ 2710.898175] P4D 58fab067 [ 2710.898353] PUD 586c0067 [ 2710.898531] PMD 0 [ 2710.898710] [ 2710.899045] Oops: 0000 [#1] SMP [ 2710.899232] Modules linked in: [ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54 [ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000 [ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a [ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246 [ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000 [ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298 [ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001 [ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000 [ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0 [ 2710.901907] FS: 00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000 [ 2710.902277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0 [ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2710.903180] Call Trace: [ 2710.903332] hrtimer_try_to_cancel+0x1a/0x93 [ 2710.903504] hrtimer_cancel+0x15/0x20 [ 2710.903667] qdisc_watchdog_cancel+0x12/0x14 [ 2710.903866] htb_destroy+0x2e/0xf7 [ 2710.904097] qdisc_create+0x377/0x3fd [ 2710.904330] tc_modify_qdisc+0x4d2/0x4fd [ 2710.904511] rtnetlink_rcv_msg+0x188/0x197 [ 2710.904682] ? rcu_read_unlock+0x3e/0x5f [ 2710.904849] ? rtnl_newlink+0x729/0x729 [ 2710.905017] netlink_rcv_skb+0x6c/0xce [ 2710.905183] rtnetlink_rcv+0x23/0x2a [ 2710.905345] netlink_unicast+0x103/0x181 [ 2710.905511] netlink_sendmsg+0x326/0x337 [ 2710.905679] sock_sendmsg_nosec+0x14/0x3f [ 2710.905847] sock_sendmsg+0x29/0x2e [ 2710.906010] ___sys_sendmsg+0x209/0x28b [ 2710.906176] ? do_raw_spin_unlock+0xcd/0xf8 [ 2710.906346] ? _raw_spin_unlock+0x27/0x31 [ 2710.906514] ? __handle_mm_fault+0x651/0xdb1 [ 2710.906685] ? check_chain_key+0xb0/0xfd [ 2710.906855] __sys_sendmsg+0x45/0x63 [ 2710.907018] ? __sys_sendmsg+0x45/0x63 [ 2710.907185] SyS_sendmsg+0x19/0x1b [ 2710.907344] entry_SYSCALL_64_fastpath+0x23/0xc2 Note that probably this bug goes further back because the default qdisc handling always calls ->destroy on init failure too. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29packet: Don't write vnet header beyond end of bufferBenjamin Poirier
... which may happen with certain values of tp_reserve and maclen. Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29tipc: permit bond slave as bearerParthasarathy Bhuvaragan
For a bond slave device as a tipc bearer, the dev represents the bond interface and orig_dev represents the slave in tipc_l2_rcv_msg(). Since we decode the tipc_ptr from bonding device (dev), we fail to find the bearer and thus tipc links are not established. In this commit, we register the tipc protocol callback per device and look for tipc bearer from both the devices. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29ipv6: do not set sk_destruct in IPV6_ADDRFORM sockoptXin Long
ChunYu found a kernel warn_on during syzkaller fuzzing: [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0 [40226.144849] Call Trace: [40226.147590] <IRQ> [40226.149859] dump_stack+0xe2/0x186 [40226.176546] __warn+0x1a4/0x1e0 [40226.180066] warn_slowpath_null+0x31/0x40 [40226.184555] inet_sock_destruct+0x78d/0x9a0 [40226.246355] __sk_destruct+0xfa/0x8c0 [40226.290612] rcu_process_callbacks+0xaa0/0x18a0 [40226.336816] __do_softirq+0x241/0x75e [40226.367758] irq_exit+0x1f6/0x220 [40226.371458] smp_apic_timer_interrupt+0x7b/0xa0 [40226.376507] apic_timer_interrupt+0x93/0xa0 The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct. As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"), udp has changed to use udp_destruct_sock as sk_destruct where it would udp_rmem_release all rmem. But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after changing family to PF_INET. If rmem is not 0 at that time, and there is no place to release rmem before calling inet_sock_destruct, the warn_on will be triggered. This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has already set it's sk_destruct with inet_sock_destruct and UDP has set with udp_destruct_sock since they're created. Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-08-29 1) Fix dst_entry refcount imbalance when using socket policies. From Lorenzo Colitti. 2) Fix locking when adding the ESP trailers. 3) Fix tailroom calculation for the ESP trailer by using skb_tailroom instead of skb_availroom. 4) Fix some info leaks in xfrm_user. From Mathias Krause. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net: dsa: Don't dereference dst->cpu_dp->netdevFlorian Fainelli
If we do not have a master network device attached dst->cpu_dp will be NULL and accessing cpu_dp->netdev will create a trace similar to the one below. The correct check is on dst->cpu_dp period. [ 1.004650] DSA: switch 0 0 parsed [ 1.008078] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [ 1.016195] pgd = c0003000 [ 1.018918] [00000010] *pgd=80000000004003, *pmd=00000000 [ 1.024349] Internal error: Oops: 206 [#1] SMP ARM [ 1.029157] Modules linked in: [ 1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00071-g45b45afab9bd-dirty #7 [ 1.040772] Hardware name: Broadcom STB (Flattened Device Tree) [ 1.046704] task: ee08f840 task.stack: ee090000 [ 1.051258] PC is at dsa_register_switch+0x5e0/0x9dc [ 1.056234] LR is at dsa_register_switch+0x5d0/0x9dc [ 1.061211] pc : [<c08fb28c>] lr : [<c08fb27c>] psr: 60000213 [ 1.067491] sp : ee091d88 ip : 00000000 fp : 0000000c [ 1.072728] r10: 00000000 r9 : 00000001 r8 : ee208010 [ 1.077965] r7 : ee2b57b0 r6 : ee2b5780 r5 : 00000000 r4 : ee208e0c [ 1.084506] r3 : 00000000 r2 : 00040d00 r1 : 2d1b2000 r0 : 00000016 [ 1.091050] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 1.098199] Control: 32c5387d Table: 00003000 DAC: fffffffd [ 1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bridge: check for null fdb->dst before notifying switchdev driversRoopa Prabhu
current switchdev drivers dont seem to support offloading fdb entries pointing to the bridge device which have fdb->dst not set to any port. This patch adds a NULL fdb->dst check in the switchdev notifier code. This patch fixes the below NULL ptr dereference: $bridge fdb add 00:02:00:00:00:33 dev br0 self [ 69.953374] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 69.954044] IP: br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] PGD 66527067 [ 69.954044] P4D 66527067 [ 69.954044] PUD 7899c067 [ 69.954044] PMD 0 [ 69.954044] [ 69.954044] Oops: 0000 [#1] SMP [ 69.954044] Modules linked in: [ 69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1 [ 69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 69.954044] task: ffff88007b827140 task.stack: ffffc90001564000 [ 69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246 [ 69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX: 00000000000000c0 [ 69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI: ffff8800795d0600 [ 69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09: 0000000000000000 [ 69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12: ffff8800795e0880 [ 69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15: 000000000000001c [ 69.954044] FS: 00007f93d3085700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 69.954044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4: 00000000000006e0 [ 69.954044] Call Trace: [ 69.954044] fdb_notify+0x3f/0xf0 [ 69.954044] __br_fdb_add.isra.12+0x1a7/0x370 [ 69.954044] br_fdb_add+0x178/0x280 [ 69.954044] rtnl_fdb_add+0x10a/0x200 [ 69.954044] rtnetlink_rcv_msg+0x1b4/0x240 [ 69.954044] ? skb_free_head+0x21/0x40 [ 69.954044] ? rtnl_calcit.isra.18+0xf0/0xf0 [ 69.954044] netlink_rcv_skb+0xed/0x120 [ 69.954044] rtnetlink_rcv+0x15/0x20 [ 69.954044] netlink_unicast+0x180/0x200 [ 69.954044] netlink_sendmsg+0x291/0x370 [ 69.954044] ___sys_sendmsg+0x180/0x2e0 [ 69.954044] ? filemap_map_pages+0x2db/0x370 [ 69.954044] ? do_wp_page+0x11d/0x420 [ 69.954044] ? __handle_mm_fault+0x794/0xd80 [ 69.954044] ? vma_link+0xcb/0xd0 [ 69.954044] __sys_sendmsg+0x4c/0x90 [ 69.954044] SyS_sendmsg+0x12/0x20 [ 69.954044] do_syscall_64+0x63/0xe0 [ 69.954044] entry_SYSCALL64_slow_path+0x25/0x25 [ 69.954044] RIP: 0033:0x7f93d2bad690 [ 69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX: 00007f93d2bad690 [ 69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI: 0000000000000003 [ 69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09: 000000000000000a [ 69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12: 00007ffc7217a670 [ 69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15: 00007ffc72182aa0 [ 69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47 20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d 55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00 [ 69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP: ffffc90001567918 [ 69.954044] CR2: 0000000000000008 [ 69.954044] ---[ end trace 03e9eec4a82c238b ]--- Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28ipv6: set dst.obsolete when a cached route has expiredXin Long
Now it doesn't check for the cached route expiration in ipv6's dst_ops->check(), because it trusts dst_gc that would clean the cached route up when it's expired. The problem is in dst_gc, it would clean the cached route only when it's refcount is 1. If some other module (like xfrm) keeps holding it and the module only release it when dst_ops->check() fails. But without checking for the cached route expiration, .check() may always return true. Meanwhile, without releasing the cached route, dst_gc couldn't del it. It will cause this cached route never to expire. This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK in .check. Note that this is even needed when ipv6 dst_gc timer is removed one day. It would set dst.obsolete in .redirect and .update_pmtu instead, and check for cached route expiration when getting it, just like what ipv4 route does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28ipv6: fix sparse warning on rt6i_nodeWei Wang
Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This generates a new sparse warning on rt->rt6i_node related code: net/ipv6/route.c:1394:30: error: incompatible types in comparison expression (different address spaces) ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison expression (different address spaces) This commit adds "__rcu" tag for rt6i_node and makes sure corresponding rcu API is used for it. After this fix, sparse no longer generates the above warning. Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel used while creating sessions with netlinkGuillaume Nault
Use l2tp_tunnel_get() to retrieve tunnel, so that it can't go away on us. Otherwise l2tp_tunnel_destruct() might release the last reference count concurrently, thus freeing the tunnel while we're using it. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while handling genl TUNNEL_GET commandsGuillaume Nault
Use l2tp_tunnel_get() instead of l2tp_tunnel_find() so that we get a reference on the tunnel, preventing l2tp_tunnel_destruct() from freeing it from under us. Also move l2tp_tunnel_get() below nlmsg_new() so that we only take the reference when needed. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while handling genl tunnel updatesGuillaume Nault
We need to make sure the tunnel is not going to be destroyed by l2tp_tunnel_destruct() concurrently. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while processing genl delete commandGuillaume Nault
l2tp_nl_cmd_tunnel_delete() needs to take a reference on the tunnel, to prevent it from being concurrently freed by l2tp_tunnel_destruct(). Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while looking up sessions in l2tp_netlinkGuillaume Nault
l2tp_tunnel_find() doesn't take a reference on the returned tunnel. Therefore, it's unsafe to use it because the returned tunnel can go away on us anytime. Fix this by defining l2tp_tunnel_get(), which works like l2tp_tunnel_find(), but takes a reference on the returned tunnel. Caller then has to drop this reference using l2tp_tunnel_dec_refcount(). As l2tp_tunnel_dec_refcount() needs to be moved to l2tp_core.h, let's simplify the patch and not move the L2TP_REFCNT_DEBUG part. This code has been broken (not even compiling) in May 2012 by commit a4ca44fa578c ("net: l2tp: Standardize logging styles") and fixed more than two years later by commit 29abe2fda54f ("l2tp: fix missing line continuation"). So it doesn't appear to be used by anyone. Same thing for l2tp_tunnel_free(); instead of moving it to l2tp_core.h, let's just simplify things and call kfree_rcu() directly in l2tp_tunnel_dec_refcount(). Extra assertions and debugging code provided by l2tp_tunnel_free() didn't help catching any of the reference counting and socket handling issues found while working on this series. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: initialise session's refcount before making it reachableGuillaume Nault
Sessions must be fully initialised before calling l2tp_session_add_to_tunnel(). Otherwise, there's a short time frame where partially initialised sessions can be accessed by external users. Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net: missing call of trace_napi_poll in busy_poll_stopJesper Dangaard Brouer
Noticed that busy_poll_stop() also invoke the drivers napi->poll() function pointer, but didn't have an associated call to trace_napi_poll() like all other call sites. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28xfrm_user: fix info leak in build_aevent()Mathias Krause
The memory reserved to dump the ID of the xfrm state includes a padding byte in struct xfrm_usersa_id added by the compiler for alignment. To prevent the heap info leak, memset(0) the sa_id before filling it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Fixes: d51d081d6504 ("[IPSEC]: Sync series - user") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28xfrm_user: fix info leak in build_expire()Mathias Krause
The memory reserved to dump the expired xfrm state includes padding bytes in struct xfrm_user_expire added by the compiler for alignment. To prevent the heap info leak, memset(0) the remainder of the struct. Initializing the whole structure isn't needed as copy_to_user_state() already takes care of clearing the padding bytes within the 'state' member. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28xfrm_user: fix info leak in xfrm_notify_sa()Mathias Krause
The memory reserved to dump the ID of the xfrm state includes a padding byte in struct xfrm_usersa_id added by the compiler for alignment. To prevent the heap info leak, memset(0) the whole struct before filling it. Cc: Herbert Xu <herbert@gondor.apana.org.au> Fixes: 0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28xfrm_user: fix info leak in copy_user_offload()Mathias Krause
The memory reserved to dump the xfrm offload state includes padding bytes of struct xfrm_user_offload added by the compiler for alignment. Add an explicit memset(0) before filling the buffer to avoid the heap info leak. Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-25udp6: set rx_dst_cookie on rx_dst updatesPaolo Abeni
Currently, in the udp6 code, the dst cookie is not initialized/updated concurrently with the RX dst used by early demux. As a result, the dst_check() in the early_demux path always fails, the rx dst cache is always invalidated, and we can't really leverage significant gain from the demux lookup. Fix it adding udp6 specific variant of sk_rx_dst_set() and use it to set the dst cookie when the dst entry is really changed. The issue is there since the introduction of early demux for ipv6. Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25tcp: fix refcnt leak with ebpf congestion controlSabrina Dubroca
There are a few bugs around refcnt handling in the new BPF congestion control setsockopt: - The new ca is assigned to icsk->icsk_ca_ops even in the case where we cannot get a reference on it. This would lead to a use after free, since that ca is going away soon. - Changing the congestion control case doesn't release the refcnt on the previous ca. - In the reinit case, we first leak a reference on the old ca, then we call tcp_reinit_congestion_control on the ca that we have just assigned, leading to deinitializing the wrong ca (->release of the new ca on the old ca's data) and releasing the refcount on the ca that we actually want to use. This is visible by building (for example) BIC as a module and setting net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from samples/bpf. This patch fixes the refcount issues, and moves reinit back into tcp core to avoid passing a ca pointer back to BPF. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25ipv6: Fix may be used uninitialized warning in rt6_checkSteffen Klassert
rt_cookie might be used uninitialized, fix this by initializing it. Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25esp: Fix skb tailroom calculationSteffen Klassert
We use skb_availroom to calculate the skb tailroom for the ESP trailer. skb_availroom calculates the tailroom and subtracts this value by reserved_tailroom. However reserved_tailroom is a union with the skb mark. This means that we subtract the tailroom by the skb mark if set. Fix this by using skb_tailroom instead. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-25esp: Fix locking on page fragment allocationSteffen Klassert
We allocate the page fragment for the ESP trailer inside a spinlock, but consume it outside of the lock. This is racy as some other cou could get the same page fragment then. Fix this by consuming the page fragment inside the lock too. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-24tipc: context imbalance at node read unlockParthasarathy Bhuvaragan
If we fail to find a valid bearer in tipc_node_get_linkname(), node_read_unlock() is called without holding the node read lock. This commit fixes this error. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: reassign pointers after skb reallocation / linearizationParthasarathy Bhuvaragan
In tipc_msg_reverse(), we assign skb attributes to local pointers in stack at startup. This is followed by skb_linearize() and for cloned buffers we perform skb relocation using pskb_expand_head(). Both these methods may update the skb attributes and thus making the pointers incorrect. In this commit, we fix this error by ensuring that the pointers are re-assigned after any of these skb operations. Fixes: 29042e19f2c60 ("tipc: let function tipc_msg_reverse() expand header when needed") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: perform skb_linearize() before parsing the inner headerParthasarathy Bhuvaragan
In tipc_rcv(), we linearize only the header and usually the packets are consumed as the nodes permit direct reception. However, if the skb contains tunnelled message due to fail over or synchronization we parse it in tipc_node_check_state() without performing linearization. This will cause link disturbances if the skb was non linear. In this commit, we perform linearization for the above messages. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net_sched: fix a refcount_t issue with noop_qdiscEric Dumazet
syzkaller reported a refcount_t warning [1] Issue here is that noop_qdisc refcnt was never really considered as a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set : if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt))) return; Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not really needed, but harmless until refcount_t came. To fix this problem, we simply need to not increment noop_qdisc.refcnt, since we never decrement it. [1] refcount_t: increment on 0; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152 RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282 RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000 RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8 RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0 R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000 attach_default_qdiscs net/sched/sch_generic.c:792 [inline] dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833 __dev_open+0x227/0x330 net/core/dev.c:1380 __dev_change_flags+0x695/0x990 net/core/dev.c:6726 dev_change_flags+0x88/0x140 net/core/dev.c:6792 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554 sock_do_ioctl+0x94/0xb0 net/socket.c:968 sock_ioctl+0x2c2/0x440 net/socket.c:1058 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Reshetova, Elena <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24bpf: fix bpf_setsockopts return valueYuchung Cheng
This patch fixes a bug causing any sock operations to always return EINVAL. Fixes: a5192c52377e ("bpf: fix to bpf_setsockops"). Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: Fix tipc_sk_reinit handling of -EAGAINBob Peterson
In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit had additional logic added to loop in the event that function rhashtable_walk_next() returned -EAGAIN. No worries. However, if rhashtable_walk_start returns -EAGAIN, it does "continue", and therefore skips the call to rhashtable_walk_stop(). That has the effect of calling rcu_read_lock() without its paired call to rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem may not be apparent for a while, especially since resize events may be rare. But the comments to rhashtable_walk_start() state: * ...Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. This patch replaces the continue with a goto and label to ensure a matching call to rhashtable_walk_stop(). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix use after free of struct proc_dir_entry in ipt_CLUSTERIP, patch from Sabrina Dubroca. 2) Fix spurious EINVAL errors from iptables over nft compatibility layer. 3) Reload pointer to ip header only if there is non-terminal verdict, ie. XT_CONTINUE, otherwise invalid memory access may happen, patch from Taehee Yoo. 4) Fix interaction between SYNPROXY and NAT, SYNPROXY adds sequence adjustment already, however from nf_nat_setup() assumes there's not. Patch from Xin Long. 5) Fix burst arithmetics in nft_limit as Joe Stringer mentioned during NFWS in Faro. Patch from Andy Zhou. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24netfilter: nf_tables: Fix nft limit burst handlingandy zhou
Current implementation treats the burst configuration the same as rate configuration. This can cause the per packet cost to be lower than configured. In effect, this bug causes the token bucket to be refilled at a higher rate than what user has specified. This patch changes the implementation so that the token bucket size is controlled by "rate + burst", while maintain the token bucket refill rate the same as user specified. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24netfilter: check for seqadj ext existence before adding it in nf_nat_setup_infoXin Long
Commit 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions") wanted to drop the packet when it fails to add seqadj ext due to no memory by checking if nfct_seqadj_ext_add returns NULL. But that nfct_seqadj_ext_add returns NULL can also happen when seqadj ext already exists in a nf_conn. It will cause that userspace protocol doesn't work when both dnat and snat are configured. Li Shuang found this issue in the case: Topo: ftp client router ftp server 10.167.131.2 <-> 10.167.131.254 10.167.141.254 <-> 10.167.141.1 Rules: # iptables -t nat -A PREROUTING -i eth1 -p tcp -m tcp --dport 21 -j \ DNAT --to-destination 10.167.141.1 # iptables -t nat -A POSTROUTING -o eth2 -p tcp -m tcp --dport 21 -j \ SNAT --to-source 10.167.141.254 In router, when both dnat and snat are added, nf_nat_setup_info will be called twice. The packet can be dropped at the 2nd time for DNAT due to seqadj ext is already added at the 1st time for SNAT. This patch is to fix it by checking for seqadj ext existence before adding it, so that the packet will not be dropped if seqadj ext already exists. Note that as Florian mentioned, as a long term, we should review ext_add() behaviour, it's better to return a pointer to the existing ext instead. Fixes: 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions") Reported-by: Li Shuang <shuali@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24net: xfrm: don't double-hold dst when sk_policy in use.Lorenzo Colitti
While removing dst_entry garbage collection, commit 52df157f17e5 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle") changed xfrm_resolve_and_create_bundle so it returns an xdst with a refcount of 1 instead of 0. However, it did not delete the dst_hold performed by xfrm_lookup when a per-socket policy is in use. This means that when a socket policy is in use, dst entries returned by xfrm_lookup have a refcount of 2, and are not freed when no longer in use. Cc: Wei Wang <weiwan@google.com> Fixes: 52df157f17 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle") Tested: https://android-review.googlesource.com/417481 Tested: https://android-review.googlesource.com/418659 Tested: https://android-review.googlesource.com/424463 Tested: https://android-review.googlesource.com/452776 passes on net-next Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-23sctp: Avoid out-of-bounds reads from address storageStefano Brivio
inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy sizeof(sockaddr_storage) bytes to fill in sockaddr structs used to export diagnostic information to userspace. However, the memory allocated to store sockaddr information is smaller than that and depends on the address family, so we leak up to 100 uninitialized bytes to userspace. Just use the size of the source structs instead, in all the three cases this is what userspace expects. Zero out the remaining memory. Unused bytes (i.e. when IPv4 addresses are used) in source structs sctp_sockaddr_entry and sctp_transport are already cleared by sctp_add_bind_addr() and sctp_transport_new(), respectively. Noticed while testing KASAN-enabled kernel with 'ss': [ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800 [ 2326.896800] Read of size 128 by task ss/9527 [ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1 [ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 [ 2326.917585] Call Trace: [ 2326.920312] dump_stack+0x63/0x8d [ 2326.924014] kasan_object_err+0x21/0x70 [ 2326.928295] kasan_report+0x288/0x540 [ 2326.932380] ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] [ 2326.938500] ? skb_put+0x8b/0xd0 [ 2326.942098] ? memset+0x31/0x40 [ 2326.945599] check_memory_region+0x13c/0x1a0 [ 2326.950362] memcpy+0x23/0x50 [ 2326.953669] inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] [ 2326.959596] ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag] [ 2326.966495] ? __lock_sock+0x102/0x150 [ 2326.970671] ? sock_def_wakeup+0x60/0x60 [ 2326.975048] ? remove_wait_queue+0xc0/0xc0 [ 2326.979619] sctp_diag_dump+0x44a/0x760 [sctp_diag] [ 2326.985063] ? sctp_ep_dump+0x280/0x280 [sctp_diag] [ 2326.990504] ? memset+0x31/0x40 [ 2326.994007] ? mutex_lock+0x12/0x40 [ 2326.997900] __inet_diag_dump+0x57/0xb0 [inet_diag] [ 2327.003340] ? __sys_sendmsg+0x150/0x150 [ 2327.007715] inet_diag_dump+0x4d/0x80 [inet_diag] [ 2327.012979] netlink_dump+0x1e6/0x490 [ 2327.017064] __netlink_dump_start+0x28e/0x2c0 [ 2327.021924] inet_diag_handler_cmd+0x189/0x1a0 [inet_diag] [ 2327.028045] ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag] [ 2327.034651] ? inet_diag_dump_compat+0x190/0x190 [inet_diag] [ 2327.040965] ? __netlink_lookup+0x1b9/0x260 [ 2327.045631] sock_diag_rcv_msg+0x18b/0x1e0 [ 2327.050199] netlink_rcv_skb+0x14b/0x180 [ 2327.054574] ? sock_diag_bind+0x60/0x60 [ 2327.058850] sock_diag_rcv+0x28/0x40 [ 2327.062837] netlink_unicast+0x2e7/0x3b0 [ 2327.067212] ? netlink_attachskb+0x330/0x330 [ 2327.071975] ? kasan_check_write+0x14/0x20 [ 2327.076544] netlink_sendmsg+0x5be/0x730 [ 2327.080918] ? netlink_unicast+0x3b0/0x3b0 [ 2327.085486] ? kasan_check_write+0x14/0x20 [ 2327.090057] ? selinux_socket_sendmsg+0x24/0x30 [ 2327.095109] ? netlink_unicast+0x3b0/0x3b0 [ 2327.099678] sock_sendmsg+0x74/0x80 [ 2327.103567] ___sys_sendmsg+0x520/0x530 [ 2327.107844] ? __get_locked_pte+0x178/0x200 [ 2327.112510] ? copy_msghdr_from_user+0x270/0x270 [ 2327.117660] ? vm_insert_page+0x360/0x360 [ 2327.122133] ? vm_insert_pfn_prot+0xb4/0x150 [ 2327.126895] ? vm_insert_pfn+0x32/0x40 [ 2327.131077] ? vvar_fault+0x71/0xd0 [ 2327.134968] ? special_mapping_fault+0x69/0x110 [ 2327.140022] ? __do_fault+0x42/0x120 [ 2327.144008] ? __handle_mm_fault+0x1062/0x17a0 [ 2327.148965] ? __fget_light+0xa7/0xc0 [ 2327.153049] __sys_sendmsg+0xcb/0x150 [ 2327.157133] ? __sys_sendmsg+0xcb/0x150 [ 2327.161409] ? SyS_shutdown+0x140/0x140 [ 2327.165688] ? exit_to_usermode_loop+0xd0/0xd0 [ 2327.170646] ? __do_page_fault+0x55d/0x620 [ 2327.175216] ? __sys_sendmsg+0x150/0x150 [ 2327.179591] SyS_sendmsg+0x12/0x20 [ 2327.183384] do_syscall_64+0xe3/0x230 [ 2327.187471] entry_SYSCALL64_slow_path+0x25/0x25 [ 2327.192622] RIP: 0033:0x7f41d18fa3b0 [ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0 [ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003 [ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040 [ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003 [ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084 [ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64 [ 2327.251953] Allocated: [ 2327.254581] PID = 9484 [ 2327.257215] save_stack_trace+0x1b/0x20 [ 2327.261485] save_stack+0x46/0xd0 [ 2327.265179] kasan_kmalloc+0xad/0xe0 [ 2327.269165] kmem_cache_alloc_trace+0xe6/0x1d0 [ 2327.274138] sctp_add_bind_addr+0x58/0x180 [sctp] [ 2327.279400] sctp_do_bind+0x208/0x310 [sctp] [ 2327.284176] sctp_bind+0x61/0xa0 [sctp] [ 2327.288455] inet_bind+0x5f/0x3a0 [ 2327.292151] SYSC_bind+0x1a4/0x1e0 [ 2327.295944] SyS_bind+0xe/0x10 [ 2327.299349] do_syscall_64+0xe3/0x230 [ 2327.303433] return_from_SYSCALL_64+0x0/0x6a [ 2327.308194] Freed: [ 2327.310434] PID = 4131 [ 2327.313065] save_stack_trace+0x1b/0x20 [ 2327.317344] save_stack+0x46/0xd0 [ 2327.321040] kasan_slab_free+0x73/0xc0 [ 2327.325220] kfree+0x96/0x1a0 [ 2327.328530] dynamic_kobj_release+0x15/0x40 [ 2327.333195] kobject_release+0x99/0x1e0 [ 2327.337472] kobject_put+0x38/0x70 [ 2327.341266] free_notes_attrs+0x66/0x80 [ 2327.345545] mod_sysfs_teardown+0x1a5/0x270 [ 2327.350211] free_module+0x20/0x2a0 [ 2327.354099] SyS_delete_module+0x2cb/0x2f0 [ 2327.358667] do_syscall_64+0xe3/0x230 [ 2327.362750] return_from_SYSCALL_64+0x0/0x6a [ 2327.367510] Memory state around the buggy address: [ 2327.372855] ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc [ 2327.380914] ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00 [ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb [ 2327.397031] ^ [ 2327.401792] ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc [ 2327.409850] ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00 [ 2327.417907] ================================================================== This fixes CVE-2017-7558. References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266 Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Cc: Xin Long <lucien.xin@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: dsa: use consume_skb()Eric Dumazet
Two kfree_skb() should be consume_skb(), to be friend with drop monitor (perf record ... -e skb:kfree_skb) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: dsa: skb_put_padto() already frees nskbFlorian Fainelli
The first call of skb_put_padto() will free up the SKB on error, but we return NULL which tells dsa_slave_xmit() that the original SKB should be freed so this would lead to a double free here. The second skb_put_padto() already frees the passed sk_buff reference upon error, so calling kfree_skb() on it again is not necessary. Detected by CoverityScan, CID#1416687 ("USE_AFTER_FREE") Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: core: Specify skb_pad()/skb_put_padto() SKB freeingFlorian Fainelli
Rename skb_pad() into __skb_pad() and make it take a third argument: free_on_error which controls whether kfree_skb() should be called or not, skb_pad() directly makes use of it and passes true to preserve its existing behavior. Do exactly the same thing with __skb_put_padto() and skb_put_padto(). Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22net: sched: don't do tcf_chain_flush from tcf_chain_destroyJiri Pirko
tcf_chain_flush needs to be called with RTNL. However, on free_tcf-> tcf_action_goto_chain_fini-> tcf_chain_put-> tcf_chain_destroy-> tcf_chain_flush callpath, it is called without RTNL. This issue was notified by following warning: [ 155.599052] WARNING: suspicious RCU usage [ 155.603165] 4.13.0-rc5jiri+ #54 Not tainted [ 155.607456] ----------------------------- [ 155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage! Since on this callpath, the chain is guaranteed to be already empty by check in tcf_chain_put, move the tcf_chain_flush call out and call it only where it is needed - into tcf_block_put. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22net: sched: fix use after free when tcf_chain_destroy is called multiple timesJiri Pirko
The goto_chain termination action takes a reference of a chain. In that case, there is an issue when block_put is called tcf_chain_destroy directly. The follo-up call of tcf_chain_put by goto_chain action free works with memory that is already freed. This was caught by kasan: [ 220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50 [ 220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261 [ 220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54 [ 220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016 [ 220.371784] Call Trace: [ 220.374290] <IRQ> [ 220.376355] dump_stack+0xd5/0x150 [ 220.391485] print_address_description+0x86/0x410 [ 220.396308] kasan_report+0x181/0x4c0 [ 220.415211] tcf_chain_put+0x1b/0x50 [ 220.418949] free_tcf+0x95/0xc0 So allow tcf_chain_destroy to be called multiple times, free only in case the reference count drops to 0. Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22udp: on peeking bad csum, drop packets even if not at headEric Dumazet
When peeking, if a bad csum is discovered, the skb is unlinked from the queue with __sk_queue_drop_skb and the peek operation restarted. __sk_queue_drop_skb only drops packets that match the queue head. This fails if the skb was found after the head, using SO_PEEK_OFF socket option. This causes an infinite loop. We MUST drop this problematic skb, and we can simply check if skb was already removed by another thread, by looking at skb->next : This pointer is set to NULL by the __skb_unlink() operation, that might have happened only under the spinlock protection. Many thanks to syzkaller team (and particularly Dmitry Vyukov who provided us nice C reproducers exhibiting the lockup) and Willem de Bruijn who provided first version for this patch and a test program. Fixes: 627d2d6b5500 ("udp: enable MSG_PEEK at non-zero offset") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22tipc: fix a race condition of releasing subscriber objectYing Xue
No matter whether a request is inserted into workqueue as a work item to cancel a subscription or to delete a subscription's subscriber asynchronously, the work items may be executed in different workers. As a result, it doesn't mean that one request which is raised prior to another request is definitely handled before the latter. By contrast, if the latter request is executed before the former request, below error may happen: [ 656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117 [ 656.184487] general protection fault: 0000 [#1] SMP [ 656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel] [ 656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6 [ 656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc] [ 656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000 [ 656.190157] RIP: 0010:spin_bug+0xdd/0xf0 [ 656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202 [ 656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000 [ 656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08 [ 656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b [ 656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b [ 656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0 [ 656.196101] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000 [ 656.197172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0 [ 656.198873] Call Trace: [ 656.199210] do_raw_spin_lock+0x66/0xa0 [ 656.199735] _raw_spin_lock_bh+0x19/0x20 [ 656.200258] tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc] [ 656.200990] tipc_subscrb_rcv_cb+0x45/0x260 [tipc] [ 656.201632] tipc_receive_from_sock+0xaf/0x100 [tipc] [ 656.202299] tipc_recv_work+0x2b/0x60 [tipc] [ 656.202872] process_one_work+0x157/0x420 [ 656.203404] worker_thread+0x69/0x4c0 [ 656.203898] kthread+0x138/0x170 [ 656.204328] ? process_one_work+0x420/0x420 [ 656.204889] ? kthread_create_on_node+0x40/0x40 [ 656.205527] ret_from_fork+0x29/0x40 [ 656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 <45> 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f [ 656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8 [ 656.209798] ---[ end trace e2a800e6eb0770be ]--- In above scenario, the request of deleting subscriber was performed earlier than the request of canceling a subscription although the latter was issued before the former, which means tipc_subscrb_delete() was called before tipc_subscrp_cancel(). As a result, when tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was executed to cancel a subscription, the subscription's subscriber refcnt had been decreased to 1. After tipc_subscrp_delete() where the subscriber was freed because its refcnt was decremented to zero, but the subscriber's lock had to be released, as a consequence, panic happened. By contrast, if we increase subscriber's refcnt before tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(), the panic issue can be avoided. Fixes: d094c4d5f5c7 ("tipc: add subscription refcount to avoid invalid delete") Reported-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22tipc: remove subscription references only for pending timersParthasarathy Bhuvaragan
In commit, 139bb36f754a ("tipc: advance the time of deleting subscription from subscriber->subscrp_list"), we delete the subscription from the subscribers list and from nametable unconditionally. This leads to the following bug if the timer running tipc_subscrp_timeout() in another CPU accesses the subscription list after the subscription delete request. [39.570] general protection fault: 0000 [#1] SMP :: [39.574] task: ffffffff81c10540 task.stack: ffffffff81c00000 [39.575] RIP: 0010:tipc_subscrp_timeout+0x32/0x80 [tipc] [39.576] RSP: 0018:ffff88003ba03e90 EFLAGS: 00010282 [39.576] RAX: dead000000000200 RBX: ffff88003f0f3600 RCX: 0000000000000101 [39.577] RDX: dead000000000100 RSI: 0000000000000201 RDI: ffff88003f0d7948 [39.578] RBP: ffff88003ba03ea0 R08: 0000000000000001 R09: ffff88003ba03ef8 [39.579] R10: 000000000000014f R11: 0000000000000000 R12: ffff88003f0d7948 [39.580] R13: ffff88003f0f3618 R14: ffffffffa006c250 R15: ffff88003f0f3600 [39.581] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000 [39.582] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39.583] CR2: 00007f831c6e0714 CR3: 000000003d3b0000 CR4: 00000000000006f0 [39.584] Call Trace: [39.584] <IRQ> [39.585] call_timer_fn+0x3d/0x180 [39.585] ? tipc_subscrb_rcv_cb+0x260/0x260 [tipc] [39.586] run_timer_softirq+0x168/0x1f0 [39.586] ? sched_clock_cpu+0x16/0xc0 [39.587] __do_softirq+0x9b/0x2de [39.587] irq_exit+0x60/0x70 [39.588] smp_apic_timer_interrupt+0x3d/0x50 [39.588] apic_timer_interrupt+0x86/0x90 [39.589] RIP: 0010:default_idle+0x20/0xf0 [39.589] RSP: 0018:ffffffff81c03e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [39.590] RAX: 0000000000000000 RBX: ffffffff81c10540 RCX: 0000000000000000 [39.591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [39.592] RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000000 [39.593] R10: ffffc90001cbbe00 R11: 0000000000000000 R12: 0000000000000000 [39.594] R13: ffffffff81c10540 R14: 0000000000000000 R15: 0000000000000000 [39.595] </IRQ> :: [39.603] RIP: tipc_subscrp_timeout+0x32/0x80 [tipc] RSP: ffff88003ba03e90 [39.604] ---[ end trace 79ce94b7216cb459 ]--- Fixes: 139bb36f754a ("tipc: advance the time of deleting subscription from subscriber->subscrp_list") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>