summaryrefslogtreecommitdiff
path: root/net/sctp
AgeCommit message (Collapse)Author
2015-06-29sctp: Fix race between OOTB responce and route removalAlexander Sverdlin
There is NULL pointer dereference possible during statistics update if the route used for OOTB responce is removed at unfortunate time. If the route exists when we receive OOTB packet and we finally jump into sctp_packet_transmit() to send ABORT, but in the meantime route is removed under our feet, we take "no_route" path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...). But sctp_ootb_pkt_new() used to prepare responce packet doesn't call sctp_transport_set_owner() and therefore there is no asoc associated with this packet. Probably temporary asoc just for OOTB responces is overkill, so just introduce a check like in all other places in sctp_packet_transmit(), where "asoc" is dereferenced. To reproduce this, one needs to 0. ensure that sctp module is loaded (otherwise ABORT is not generated) 1. remove default route on the machine 2. while true; do ip route del [interface-specific route] ip route add [interface-specific route] done 3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT responce On x86_64 the crash looks like this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp] PGD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ... CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1 Hardware name: ... task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000 RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp] RSP: 0018:ffff880127c037b8 EFLAGS: 00010296 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480 RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700 RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28 R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0 FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0 Stack: ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400 ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520 0000000000000000 0000000000000001 0000000000000000 0000000000000000 Call Trace: <IRQ> [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp] [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp] [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp] [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210 [<ffffffff810e0329>] ? update_process_times+0x59/0x60 [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0 [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0 [<ffffffff8101f599>] ? read_tsc+0x9/0x10 [<ffffffff8116d4b5>] ? put_page+0x55/0x60 [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100 [<ffffffff81462b68>] ? skb_free_head+0x58/0x80 [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic] [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0 [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp] [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp] [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp] [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp] [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp] [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30 [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150 [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210 [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90 [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370 [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0 [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50 [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60 [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0 [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120 [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169] [<ffffffff8147896a>] net_rx_action+0x21a/0x360 [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0 [<ffffffff8107912d>] irq_exit+0xad/0xb0 [<ffffffff8157d158>] do_IRQ+0x58/0xf0 [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20 [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp] [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0 [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20 [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480 [<ffffffff8156b365>] rest_init+0x85/0x90 [<ffffffff818eb035>] start_kernel+0x48b/0x4ac [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120 [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184 Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9 RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp] RSP <ffff880127c037b8> CR2: 0000000000000020 ---[ end trace 5aec7fd2dc983574 ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) drm_kms_helper: panic occurred, switching back to text console ---[ end Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28net: Kill sock->sk_protinfoDavid Miller
No more users, so it can now be removed. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mellanox/mlx4/main.c net/packet/af_packet.c Both conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-14sctp: fix ASCONF list handlingMarcelo Ricardo Leitner
->auto_asconf_splist is per namespace and mangled by functions like sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization. Also, the call to inet_sk_copy_descendant() was backuping ->auto_asconf_list through the copy but was not honoring ->do_auto_asconf, which could lead to list corruption if it was different between both sockets. This commit thus fixes the list handling by using ->addr_wq_lock spinlock to protect the list. A special handling is done upon socket creation and destruction for that. Error handlig on sctp_init_sock() will never return an error after having initialized asconf, so sctp_destroy_sock() can be called without addrq_wq_lock. The lock now will be take on sctp_close_sock(), before locking the socket, so we don't do it in inverse order compared to sctp_addr_wq_timeout_handler(). Instead of taking the lock on sctp_sock_migrate() for copying and restoring the list values, it's preferred to avoid rewritting it by implementing sctp_copy_descendant(). Issue was found with a test application that kept flipping sysctl default_auto_asconf on and off, but one could trigger it by issuing simultaneous setsockopt() calls on multiple sockets or by creating/destroying sockets fast enough. This is only triggerable locally. Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") Reported-by: Ji Jianwen <jiji@redhat.com> Suggested-by: Neil Horman <nhorman@tuxdriver.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-06-12sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHOMarcelo Ricardo Leitner
Currently, we can ask to authenticate DATA chunks and we can send DATA chunks on the same packet as COOKIE_ECHO, but if you try to combine both, the DATA chunk will be sent unauthenticated and peer won't accept it, leading to a communication failure. This happens because even though the data was queued after it was requested to authenticate DATA chunks, it was also queued before we could know that remote peer can handle authenticating, so sctp_auth_send_cid() returns false. The fix is whenever we set up an active key, re-check send queue for chunks that now should be authenticated. As a result, such packet will now contain COOKIE_ECHO + AUTH + DATA chunks, in that order. Reported-by: Liu Wei <weliu@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Add rt6_get_cookie() functionMartin KaFai Lau
Instead of doing the rt6->rt6i_node check whenever we need to get the route's cookie. Refactor it into rt6_get_cookie(). It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also percpu rt6_info later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Remove external dependency on rt6i_dst and rt6i_srcMartin KaFai Lau
This patch removes the assumptions that the returned rt is always a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the destination and source address. The dst and src can be recovered from the calling site. We may consider to rename (rt6i_dst, rt6i_src) to (rt6i_key_dst, rt6i_key_src) later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11net: Pass kern from net_proto_family.create to sk_allocEric W. Biederman
In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25sctp: avoid to repeatedly declare external variablesYing Xue
Move the declaration for external variables to sctp.h file avoiding to repeatedly declare them with extern keyword. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: Remove iocb argument from sendmsg and recvmsgYing Xue
After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: use common macro for assering skb->cb[] available size in protocol familiesEyal Birger
As part of an effort to move skb->dropcount to skb->cb[] use a common macro in protocol families using skb->cb[] for ancillary data to validate available room in skb->cb[]. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/vxlan.c drivers/vhost/net.c include/linux/if_vlan.h net/core/dev.c The net/core/dev.c conflict was the overlap of one commit marking an existing function static whilst another was adding a new function. In the include/linux/if_vlan.h case, the type used for a local variable was changed in 'net', whereas the function got rewritten to fix a stacked vlan bug in 'net-next'. In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next' overlapped with an endainness fix for VHOST 1.0 in 'net'. In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter in 'net-next' whereas in 'net' there was a bug fix to pass in the correct network namespace pointer in calls to this function. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02net: sctp: Deletion of an unnecessary check before the function call "kfree"Markus Elfring
The kfree() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-By: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30net: sctp: fix passing wrong parameter header to param_type2af in ↵Saran Maruti Ramanara
sctp_process_param When making use of RFC5061, section 4.2.4. for setting the primary IP address, we're passing a wrong parameter header to param_type2af(), resulting always in NULL being returned. At this point, param.p points to a sctp_addip_param struct, containing a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation id. Followed by that, as also presented in RFC5061 section 4.2.4., comes the actual sctp_addr_param, which also contains a sctp_paramhdr, but this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that param_type2af() can make use of. Since we already hold a pointer to addr_param from previous line, just reuse it for param_type2af(). Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Saran Maruti Ramanara <saran.neti@telus.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26net: sctp: fix slab corruption from use after free on INIT collisionsDaniel Borkmann
When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950c646 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950c646 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0 [ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17net: sctp: fix race for one-to-many sockets in sendmsg's auto associateDaniel Borkmann
I.e. one-to-many sockets in SCTP are not required to explicitly call into connect(2) or sctp_connectx(2) prior to data exchange. Instead, they can directly invoke sendmsg(2) and the SCTP stack will automatically trigger connection establishment through 4WHS via sctp_primitive_ASSOCIATE(). However, this in its current implementation is racy: INIT is being sent out immediately (as it cannot be bundled anyway) and the rest of the DATA chunks are queued up for later xmit when connection is established, meaning sendmsg(2) will return successfully. This behaviour can result in an undesired side-effect that the kernel made the application think the data has already been transmitted, although none of it has actually left the machine, worst case even after close(2)'ing the socket. Instead, when the association from client side has been shut down e.g. first gracefully through SCTP_EOF and then close(2), the client could afterwards still receive the server's INIT_ACK due to a connection with higher latency. This INIT_ACK is then considered out of the blue and hence responded with ABORT as there was no alive assoc found anymore. This can be easily reproduced f.e. with sctp_test application from lksctp. One way to fix this race is to wait for the handshake to actually complete. The fix defers waiting after sctp_primitive_ASSOCIATE() and sctp_primitive_SEND() succeeded, so that DATA chunks cooked up from sctp_sendmsg() have already been placed into the output queue through the side-effect interpreter, and therefore can then be bundeled together with COOKIE_ECHO control chunks. strace from example application (shortened): socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...}, msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF close(3) = 0 tcpdump before patch (fooling the application): 22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684] 22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591] 22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT] tcpdump after patch: 14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729] 14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492] 14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...] 14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0] 14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...] 14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0] 14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...] 14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0] 14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN] 14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK] 14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE] Looks like this bug is from the pre-git history museum. ;) Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10net: introduce helper macro for_each_cmsghdrGu Zheng
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/amd/xgbe/xgbe-desc.c drivers/net/ethernet/renesas/sh_eth.c Overlapping changes in both conflict cases. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09put iov_iter into msghdrAl Viro
Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09net: sctp: use MAX_HEADER for headroom reserve in output pathDaniel Borkmann
To accomodate for enough headroom for tunnels, use MAX_HEADER instead of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs of trinity an skb_under_panic() via SCTP output path (see reference). I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere in other protocols might be one possible cause for this. In any case, it looks like accounting on chunks themself seems to look good as the skb already passed the SCTP output path and did not hit any skb_over_panic(). Given tunneling was enabled in his .config, the headroom would have been expanded by MAX_HEADER in this case. Reported-by: Robert Święcki <robert@swiecki.net> Reference: https://lkml.org/lkml/2014/12/1/507 Fixes: 594ccc14dfe4d ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24new helper: memcpy_from_msg()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-21net: sctp: keep owned chunk in destructor_arg instead of skb->cbDaniel Borkmann
It's just silly to hold the skb destructor argument around inside skb->cb[] as we currently do in SCTP. Nowadays, we're sort of cheating on data accounting in the sense that due to commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet"), we orphan the skb already in the SCTP output path, i.e. giving back charged data memory, and use a different destructor only to make sure the sk doesn't vanish on skb destruction time. Thus, cb[] is still valid here as we operate within the SCTP layer. (It's generally actually a big candidate for future rework, imho.) However, storing the destructor in the cb[] can easily cause issues should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP layer, since cb[] may get overwritten by lower layers and thus can corrupt the chunk pointer. There are no such issues at present, but lets keep the chunk in destructor_arg, as this is the actual purpose for it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/chelsio/cxgb4vf/sge.c drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c sge.c was overlapping two changes, one to use the new __dev_alloc_page() in net-next, and one to use s->fl_pg_order in net. ixgbe_phy.c was a set of overlapping whitespace changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: sctp: fix memory leak in auth key managementDaniel Borkmann
A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811c88d8>] __kmalloc+0xe8/0x270 [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp] [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp] [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp] [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20 [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0 [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed ↵Daniel Borkmann
packet An SCTP server doing ASCONF will panic on malformed INIT ping-of-death in the form of: ------------ INIT[PARAM: SET_PRIMARY_IP] ------------> While the INIT chunk parameter verification dissects through many things in order to detect malformed input, it misses to actually check parameters inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary IP address' parameter in ASCONF, which has as a subparameter an address parameter. So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0 and thus sctp_get_af_specific() returns NULL, too, which we then happily dereference unconditionally through af->from_addr_param(). The trace for the log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] PGD 0 Oops: 0000 [#1] SMP [...] Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] [...] Call Trace: <IRQ> [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp] [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp] [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [...] A minimal way to address this is to check for NULL as we do on all other such occasions where we know sctp_get_af_specific() could possibly return with NULL. Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: introduce SO_INCOMING_CPUEric Dumazet
Alternative to RPS/RFS is to use hardware support for multiple queues. Then split a set of million of sockets into worker threads, each one using epoll() to manage events on its own socket pool. Ideally, we want one thread per RX/TX queue/cpu, but we have no way to know after accept() or connect() on which queue/cpu a socket is managed. We normally use one cpu per RX queue (IRQ smp_affinity being properly set), so remembering on socket structure which cpu delivered last packet is enough to solve the problem. After accept(), connect(), or even file descriptor passing around processes, applications can use : int cpu; socklen_t len = sizeof(cpu); getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len); And use this information to put the socket into the right silo for optimal performance, as all networking stack should run on the appropriate cpu, without need to send IPI (RPS/RFS). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05net: Add and use skb_copy_datagram_msg() helper.David S. Miller
This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30sctp: replace seq_printf with seq_putsMichele Baldessari
Fixes checkpatch warning: "WARNING: Prefer seq_puts to seq_printf" Signed-off-by: Michele Baldessari <michele@acksyn.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30sctp: add transport state in /proc/net/sctp/remaddrMichele Baldessari
It is often quite helpful to be able to know the state of a transport outside of the application itself (for troubleshooting purposes or for monitoring purposes). Add it under /proc/net/sctp/remaddr. Signed-off-by: Michele Baldessari <michele@acksyn.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Include fixes for netrom and dsa (Fabian Frederick and Florian Fainelli) 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO. 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan, ip_tunnel, fou), from Li ROngQing. 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy. 5) Use after free in virtio_net, from Michael S Tsirkin. 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B Shelar. 7) ISDN gigaset and capi bug fixes from Tilman Schmidt. 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin. 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov. 10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong Wang and Eric Dumazet. 11) Don't overwrite end of SKB when parsing malformed sctp ASCONF chunks, from Daniel Borkmann. 12) Don't call sock_kfree_s() with NULL pointers, this function also has the side effect of adjusting the socket memory usage. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) bna: fix skb->truesize underestimation net: dsa: add includes for ethtool and phy_fixed definitions openvswitch: Set flow-key members. netrom: use linux/uaccess.h dsa: Fix conversion from host device to mii bus tipc: fix bug in bundled buffer reception ipv6: introduce tcp_v6_iif() sfc: add support for skb->xmit_more r8152: return -EBUSY for runtime suspend ipv4: fix a potential use after free in fou.c ipv4: fix a potential use after free in ip_tunnel_core.c hyperv: Add handling of IP header with option field in netvsc_set_hash() openvswitch: Create right mask with disabled megaflows vxlan: fix a free after use openvswitch: fix a use after free ipv4: dst_entry leak in ip_send_unicast_reply() ipv4: clean up cookie_v4_check() ipv4: share tcp_v4_save_options() with cookie_v4_check() ipv4: call __ip_options_echo() in cookie_v4_check() atm: simplify lanai.c by using module_pci_driver ...
2014-10-14net: sctp: fix remote memory pressure from excessive queueingDaniel Borkmann
This scenario is not limited to ASCONF, just taken as one example triggering the issue. When receiving ASCONF probes in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------> [...] ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------> ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed ASCONFs and have increasing serial numbers, we process such ASCONF chunk(s) marked with !end_of_packet and !singleton, since we have not yet reached the SCTP packet end. SCTP does only do verification on a chunk by chunk basis, as an SCTP packet is nothing more than just a container of a stream of chunks which it eats up one by one. We could run into the case that we receive a packet with a malformed tail, above marked as trailing JUNK. All previous chunks are here goodformed, so the stack will eat up all previous chunks up to this point. In case JUNK does not fit into a chunk header and there are no more other chunks in the input queue, or in case JUNK contains a garbage chunk header, but the encoded chunk length would exceed the skb tail, or we came here from an entirely different scenario and the chunk has pdiscard=1 mark (without having had a flush point), it will happen, that we will excessively queue up the association's output queue (a correct final chunk may then turn it into a response flood when flushing the queue ;)): I ran a simple script with incremental ASCONF serial numbers and could see the server side consuming excessive amount of RAM [before/after: up to 2GB and more]. The issue at heart is that the chunk train basically ends with !end_of_packet and !singleton markers and since commit 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") therefore preventing an output queue flush point in sctp_do_sm() -> sctp_cmd_interpreter() on the input chunk (chunk = event_arg) even though local_cork is set, but its precedence has changed since then. In the normal case, the last chunk with end_of_packet=1 would trigger the queue flush to accommodate possible outgoing bundling. In the input queue, sctp_inq_pop() seems to do the right thing in terms of discarding invalid chunks. So, above JUNK will not enter the state machine and instead be released and exit the sctp_assoc_bh_rcv() chunk processing loop. It's simply the flush point being missing at loop exit. Adding a try-flush approach on the output queue might not work as the underlying infrastructure might be long gone at this point due to the side-effect interpreter run. One possibility, albeit a bit of a kludge, would be to defer invalid chunk freeing into the state machine in order to possibly trigger packet discards and thus indirectly a queue flush on error. It would surely be better to discard chunks as in the current, perhaps better controlled environment, but going back and forth, it's simply architecturally not possible. I tried various trailing JUNK attack cases and it seems to look good now. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14net: sctp: fix panic on duplicate ASCONF chunksDaniel Borkmann
When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit 2e3216cd54b1 changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14net: sctp: fix skb_over_panic when receiving malformed ASCONF chunksDaniel Borkmann
Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8144fb1c>] skb_put+0x5c/0x70 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp] [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp] [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440 [<ffffffff81496ac5>] ip_rcv+0x275/0x350 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff81460588>] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification *and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10Merge branch 'for-3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "A lot of activities on percpu front. Notable changes are... - percpu allocator now can take @gfp. If @gfp doesn't contain GFP_KERNEL, it tries to allocate from what's already available to the allocator and a work item tries to keep the reserve around certain level so that these atomic allocations usually succeed. This will replace the ad-hoc percpu memory pool used by blk-throttle and also be used by the planned blkcg support for writeback IOs. Please note that I noticed a bug in how @gfp is interpreted while preparing this pull request and applied the fix 6ae833c7fe0c ("percpu: fix how @gfp is interpreted by the percpu allocator") just now. - percpu_ref now uses longs for percpu and global counters instead of ints. It leads to more sparse packing of the percpu counters on 64bit machines but the overhead should be negligible and this allows using percpu_ref for refcnting pages and in-memory objects directly. - The switching between percpu and single counter modes of a percpu_ref is made independent of putting the base ref and a percpu_ref can now optionally be initialized in single or killed mode. This allows avoiding percpu shutdown latency for cases where the refcounted objects may be synchronously created and destroyed in rapid succession with only a fraction of them reaching fully operational status (SCSI probing does this when combined with blk-mq support). It's also planned to be used to implement forced single mode to detect underflow more timely for debugging. There's a separate branch percpu/for-3.18-consistent-ops which cleans up the duplicate percpu accessors. That branch causes a number of conflicts with s390 and other trees. I'll send a separate pull request w/ resolutions once other branches are merged" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits) percpu: fix how @gfp is interpreted by the percpu allocator blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky percpu_ref: add PERCPU_REF_INIT_* flags percpu_ref: decouple switching to percpu mode and reinit percpu_ref: decouple switching to atomic mode and killing percpu_ref: add PCPU_REF_DEAD percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch percpu_ref: replace pcpu_ prefix with percpu_ percpu_ref: minor code and comment updates percpu_ref: relocate percpu_ref_reinit() Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" Revert "percpu: free percpu allocation info for uniprocessor system" percpu-refcount: make percpu_ref based on longs instead of ints percpu-refcount: improve WARN messages percpu: fix locking regression in the failure path of pcpu_alloc() percpu-refcount: add @gfp to percpu_ref_init() proportions: add @gfp to init functions percpu_counter: add @gfp to percpu_counter_init() percpu_counter: make percpu_counters_lock irq-safe ...
2014-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-10-06sctp: handle association restarts when the socket is closed.Vlad Yasevich
Currently association restarts do not take into consideration the state of the socket. When a restart happens, the current assocation simply transitions into established state. This creates a condition where a remote system, through a the restart procedure, may create a local association that is no way reachable by user. The conditions to trigger this are as follows: 1) Remote does not acknoledge some data causing data to remain outstanding. 2) Local application calls close() on the socket. Since data is still outstanding, the association is placed in SHUTDOWN_PENDING state. However, the socket is closed. 3) The remote tries to create a new association, triggering a restart on the local system. The association moves from SHUTDOWN_PENDING to ESTABLISHED. At this point, it is no longer reachable by any socket on the local system. This patch addresses the above situation by moving the newly ESTABLISHED association into SHUTDOWN-SENT state and bundling a SHUTDOWN after the COOKIE-ACK chunk. This way, the restarted associate immidiately enters the shutdown procedure and forces the termination of the unreachable association. Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-24Merge branch 'for-linus' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>
2014-09-09net/ipv4: bind ip_nonlocal_bind to current netnsVincent Bernat
net.ipv4.ip_nonlocal_bind sysctl was global to all network namespaces. This patch allows to set a different value for each network namespace. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-09-08percpu_counter: add @gfp to percpu_counter_init()Tejun Heo
Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>
2014-08-29sctp: Change sctp to implement csum_levelsTom Herbert
CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to appropriate account for this by decrementing csum_level. This is done by calling __skb_dec_checksum_unnecessary. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29net: sctp: fix ABI mismatch through sctp_assoc_to_state helperDaniel Borkmann
Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp tree, the official <netinet/sctp.h> header carries a copy of enum sctp_sstat_state that looks like (compared to the current in-kernel enumeration): User definition: Kernel definition: enum sctp_sstat_state { typedef enum { SCTP_EMPTY = 0, <removed> SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0, SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1, SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2, SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3, SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4, SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5, SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6, SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7, }; } sctp_state_t; This header was later on also placed into the uapi, so that user space programs can compile without having <netinet/sctp.h>, but the shipped with <linux/sctp.h> instead. While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we nevertheless have a what it appears to be dummy SCTP_EMPTY state from the very early days. While it seems to do just nothing, commit 0b8f9e25b0aa ("sctp: remove completely unsed EMPTY state") did the right thing and removed this dead code. That however, causes an off-by-one when the user asks the SCTP stack via SCTP_STATUS API and checks for the current socket state thus yielding possibly undefined behaviour in applications as they expect the kernel to tell the right thing. The enumeration had to be changed however as based on the current socket state, we access a function pointer lookup-table through this. Therefore, I think the best way to deal with this is just to add a helper function sctp_assoc_to_state() to encapsulate the off-by-one quirk. Reported-by: Tristan Su <sooqing@gmail.com> Fixes: 0b8f9e25b0aa ("sctp: remove completely unsed EMPTY state") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: sctp: fix suboptimal edge-case on non-active active/retrans path selectionDaniel Borkmann
In SCTP, selection of active (T.ACT) and retransmission (T.RET) transports is being done whenever transport control operations (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport(). Commits 4c47af4d5eb2 ("net: sctp: rework multihoming retransmission path selection to rfc4960") and a7288c4dd509 ("net: sctp: improve sctp_select_active_and_retran_path selection") have both improved it towards a more fine-grained and optimal path selection. Currently, the selection algorithm for T.ACT and T.RET is as follows: 1) Elect the two most recently used ACTIVE transports T1, T2 for T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used 2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set T.ACT<-T.PRI and T.RET<-T1 3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1 4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is still slightly suboptimal as it can lead to the following scenario: Setup: <A> <B> T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22) net.sctp.rto_min = 1000 net.sctp.path_max_retrans = 2 net.sctp.pf_retrans = 0 net.sctp.hb_interval = 1000 T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to link flapping). Here, the first time transmission is sent over PF path T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks are sent over the INACTIVE path T1 (T.PRI), which is not good. After the patch, it's choosing better transports in both cases by modifying step 4): 4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.ACT_new This will still select a best possible path in PF if available (which can also include T.PRI/T.RET), and set both T.ACT/T.RET to it. In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just for a very short while before going back ACTIVE, it will guarantee that this path will be reselected for T.ACT/T.RET since T3 (PF) is not available. Previously, this was not possible, as we would only select between T.PRI and T.RET, and a possible T3 would be NULL due to the fact that we have just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE and would select a suboptimal path when T.PRI/T.RET have worse properties. In the case that T.ACT_old permanently went to INACTIVE during this transition and there's no PF path available, plus T.PRI and T.RET are INACTIVE as well, we would now camp on T.ACT_old, but if everything is being INACTIVE there's really not much we can do except hoping for a successful HB to bring one of the transports back up again and, thus cause a new selection through sctp_assoc_control_transport(). Now both tests work fine: Case 1: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(ACTIVE) T.ACT, T.RET T2 S(PF) 3. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 5. T1 S(PF) T.ACT, T.RET T2 S(INACTIVE) [ 5.1 T1 S(INACTIVE) T.ACT, T.RET T2 S(INACTIVE) ] 6. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Case 2: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(PF) T2 S(ACTIVE) T.ACT, T.RET 3. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 5. T1 S(INACTIVE) T2 S(PF) T.ACT, T.RET [ 5.1 T1 S(INACTIVE) T2 S(INACTIVE) T.ACT, T.RET ] 6. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: sctp: spare unnecessary comparison in sctp_trans_elect_bestDaniel Borkmann
When both transports are the same, we don't have to go down that road only to realize that we will return the very same transport. We are guaranteed that curr is always non-NULL. Therefore, just short-circuit this special case. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probezhuyj
Since the transport has always been in state SCTP_UNCONFIRMED, it therefore wasn't active before and hasn't been used before, and it always has been, so it is unnecessary to bug the user with a notification. Reported-by: Deepak Khandelwal <khandelwal.deepak.1987@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Suggested-by: Michael Tuexen <tuexen@fh-muenster.de> Suggested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/Makefile net/ipv6/sysctl_net_ipv6.c Two ipv6_table_template[] additions overlap, so the index of the ipv6_table[x] assignments needed to be adjusted. In the drivers/net/Makefile case, we've gotten rid of the garbage whereby we had to list every single USB networking driver in the top-level Makefile, there is just one "USB_NETWORKING" that guards everything. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05sctp: fix possible seqlock seadlock in sctp_packet_transmit()Eric Dumazet
Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp] [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp] [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp] [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp] [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp] [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp] [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba61c3c4b ("inet: fix possible seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORSDuan Jiong
When dealing with ICMPv[46] Error Message, function icmp_socket_deliver() and icmpv6_notify() do some valid checks on packet's length, but then some protocols check packet's length redaudantly. So remove those duplicated statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in function icmp_socket_deliver() and icmpv6_notify() respectively. In addition, add missed counter in udp6/udplite6 when socket is NULL. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>