summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-02-22net/dccp: fix use after free in tw_timer_handler()Andrey Ryabinin
DCCP doesn't purge timewait sockets on network namespace shutdown. So, after net namespace destroyed we could still have an active timer which will trigger use after free in tw_timer_handler(): BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10 Read of size 8 by task swapper/1/0 Call Trace: __asan_load8+0x54/0x90 tw_timer_handler+0x4a/0xa0 call_timer_fn+0x127/0x480 expire_timers+0x1db/0x2e0 run_timer_softirq+0x12f/0x2a0 __do_softirq+0x105/0x5b4 irq_exit+0xdd/0xf0 smp_apic_timer_interrupt+0x57/0x70 apic_timer_interrupt+0x90/0xa0 Object at ffff88010e0d1bc0, in cache net_namespace size: 6848 Allocated: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xee/0x180 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x134/0x310 copy_net_ns+0x8d/0x280 create_new_namespaces+0x23f/0x340 unshare_nsproxy_namespaces+0x75/0xf0 SyS_unshare+0x299/0x4f0 entry_SYSCALL_64_fastpath+0x18/0xad Freed: save_stack_trace+0x1b/0x20 kasan_slab_free+0xae/0x180 kmem_cache_free+0xb4/0x350 net_drop_ns+0x3f/0x50 cleanup_net+0x3df/0x450 process_one_work+0x419/0xbb0 worker_thread+0x92/0x850 kthread+0x192/0x1e0 ret_from_fork+0x2e/0x40 Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge timewait sockets on net namespace destruction and prevent above issue. Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-22l2tp: Avoid schedule while atomic in exit_netRidge Kennedy
While destroying a network namespace that contains a L2TP tunnel a "BUG: scheduling while atomic" can be observed. Enabling lockdep shows that this is happening because l2tp_exit_net() is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from within an RCU critical section. l2tp_exit_net() takes rcu_read_lock_bh() << list_for_each_entry_rcu() >> l2tp_tunnel_delete() l2tp_tunnel_closeall() __l2tp_session_unhash() synchronize_rcu() << Illegal inside RCU critical section >> BUG: sleeping function called from invalid context in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2 INFO: lockdep is turned off. CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G W O 4.4.6-at1 #2 Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016 Workqueue: netns cleanup_net 0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0 ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8 0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024 Call Trace: [<ffffffff812b0013>] dump_stack+0x85/0xc2 [<ffffffff8107aee8>] ___might_sleep+0x148/0x240 [<ffffffff8107b024>] __might_sleep+0x44/0x80 [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0 [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0 [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40 [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220 [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220 [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140 [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60 [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270 [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270 [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60 [<ffffffff81501166>] cleanup_net+0x1b6/0x280 ... This bug can easily be reproduced with a few steps: $ sudo unshare -n bash # Create a shell in a new namespace # ip link set lo up # ip addr add 127.0.0.1 dev lo # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \ peer_tunnel_id 1 udp_sport 50000 udp_dport 50000 # ip l2tp add session name foo tunnel_id 1 session_id 1 \ peer_session_id 1 # ip link set foo up # exit # Exit the shell, in turn exiting the namespace $ dmesg ... [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200 ... To fix this, move the call to l2tp_tunnel_closeall() out of the RCU critical section, and instead call it from l2tp_tunnel_del_work(), which is running from the l2tp_wq workqueue. Fixes: 2b551c6e7d5b ("l2tp: close sessions before initiating tunnel delete") Signed-off-by: Ridge Kennedy <ridge.kennedy@alliedtelesis.co.nz> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini Varadhan. 2) Simplify classifier state on sk_buff in order to shrink it a bit. From Willem de Bruijn. 3) Introduce SIPHASH and it's usage for secure sequence numbers and syncookies. From Jason A. Donenfeld. 4) Reduce CPU usage for ICMP replies we are going to limit or suppress, from Jesper Dangaard Brouer. 5) Introduce Shared Memory Communications socket layer, from Ursula Braun. 6) Add RACK loss detection and allow it to actually trigger fast recovery instead of just assisting after other algorithms have triggered it. From Yuchung Cheng. 7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot. 8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert. 9) Export MPLS packet stats via netlink, from Robert Shearman. 10) Significantly improve inet port bind conflict handling, especially when an application is restarted and changes it's setting of reuseport. From Josef Bacik. 11) Implement TX batching in vhost_net, from Jason Wang. 12) Extend the dummy device so that VF (virtual function) features, such as configuration, can be more easily tested. From Phil Sutter. 13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric Dumazet. 14) Add new bpf MAP, implementing a longest prefix match trie. From Daniel Mack. 15) Packet sample offloading support in mlxsw driver, from Yotam Gigi. 16) Add new aquantia driver, from David VomLehn. 17) Add bpf tracepoints, from Daniel Borkmann. 18) Add support for port mirroring to b53 and bcm_sf2 drivers, from Florian Fainelli. 19) Remove custom busy polling in many drivers, it is done in the core networking since 4.5 times. From Eric Dumazet. 20) Support XDP adjust_head in virtio_net, from John Fastabend. 21) Fix several major holes in neighbour entry confirmation, from Julian Anastasov. 22) Add XDP support to bnxt_en driver, from Michael Chan. 23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan. 24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi. 25) Support GRO in IPSEC protocols, from Steffen Klassert" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits) Revert "ath10k: Search SMBIOS for OEM board file extension" net: socket: fix recvmmsg not returning error from sock_error bnxt_en: use eth_hw_addr_random() bpf: fix unlocking of jited image when module ronx not set arch: add ARCH_HAS_SET_MEMORY config net: napi_watchdog() can use napi_schedule_irqoff() tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" net/hsr: use eth_hw_addr_random() net: mvpp2: enable building on 64-bit platforms net: mvpp2: switch to build_skb() in the RX path net: mvpp2: simplify MVPP2_PRS_RI_* definitions net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT net: mvpp2: remove unused register definitions net: mvpp2: simplify mvpp2_bm_bufs_add() net: mvpp2: drop useless fields in mvpp2_bm_pool and related code net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue' net: mvpp2: release reference to txq_cpu[] entry after unmapping net: mvpp2: handle too large value in mvpp2_rx_time_coal_set() net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set() net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set ...
2017-02-21SUNRPC: Add a helper function xdr_stream_decode_string_dup()Trond Myklebust
Create a helper function that decodes a xdr string object, allocates a memory buffer and then store it as a NUL terminated string. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-21Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit updates from Paul Moore: "The audit changes for v4.11 are relatively small compared to what we did for v4.10, both in terms of size and impact. - two patches from Steve tweak the formatting for some of the audit records to make them more consistent with other audit records. - three patches from Richard record the name of a module on module load, fix the logging of sockaddr information when using socketcall() on 32-bit systems, and add the ability to reset audit's lost record counter. - my lone patch just fixes an annoying style nit that I was reminded about by one of Richard's patches. All these patches pass our test suite" * 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit: audit: remove unnecessary curly braces from switch/case statements audit: log module name on init_module audit: log 32-bit socketcalls audit: add feature audit_lost reset audit: Make AUDIT_ANOM_ABEND event normalized audit: Make AUDIT_KERNEL event conform to the specification
2017-02-21net: socket: fix recvmmsg not returning error from sock_errorMaxime Jayat
Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"), changed the exit path of recvmmsg to always return the datagrams variable and modified the error paths to set the variable to the error code returned by recvmsg if necessary. However in the case sock_error returned an error, the error code was then ignored, and recvmmsg returned 0. Change the error path of recvmmsg to correctly return the error code of sock_error. The bug was triggered by using recvmmsg on a CAN interface which was not up. Linux 4.6 and later return 0 in this case while earlier releases returned -ENETDOWN. Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path") Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21net: napi_watchdog() can use napi_schedule_irqoff()Eric Dumazet
hrtimer handlers run with masked hard IRQ, we can therefore use napi_schedule_irqoff() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"Eric Dumazet
This reverts commit e70ac171658679ecf6bea4bbd9e9325cd6079d2b. jtcp_rcv_established() is in fact called with hard irq being disabled. Initial bug report from Ricardo Nabinger Sanchez [1] still needs to be investigated, but does not look like a TCP bug. [1] https://www.spinics.net/lists/netdev/msg420960.html Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <xiaolong.ye@intel.com> Cc: Ricardo Nabinger Sanchez <rnsanchez@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21net/hsr: use eth_hw_addr_random()Tobias Klauser
Use eth_hw_addr_random() to set a random MAC address in order to make sure dev->addr_assign_type will be properly set to NET_ADDR_RANDOM. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21net: sock: Use USEC_PER_SEC macro instead of literal 1000000Gao Feng
The USEC_PER_SEC is used once in sock_set_timeout as the max value of tv_usec. But there are other similar codes which use the literal 1000000 in this file. It is minor cleanup to keep consitent. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21ip: fix IP_CHECKSUM handlingPaolo Abeni
The skbs processed by ip_cmsg_recv() are not guaranteed to be linear e.g. when sending UDP packets over loopback with MSGMORE. Using csum_partial() on [potentially] the whole skb len is dangerous; instead be on the safe side and use skb_checksum(). Thanks to syzkaller team to detect the issue and provide the reproducer. v1 -> v2: - move the variable declaration in a tighter scope Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21rtnl: simplify error return path in rtnl_create_link()Tobias Klauser
There is only one possible error path which reaches the err label, so return ERR_PTR(-ENOMEM) directly if alloc_netdev_mqs() fails. This also allows to omit the err variable. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21sunrpc: silence uninitialized variable warningDan Carpenter
kstrtouint() can return a couple different error codes so the check for "ret == -EINVAL" is wrong and static analysis tools correctly complain that we can use "num" without initializing it. It's not super harmful because we check the bounds. But it's also easy enough to fix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-21Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso
Jozsef Kadlecsik says: ==================== ipset patches for nf Please apply the next patches for ipset in your nf branch. Both patches should go into the stable kernel branches as well, because these are important bugfixes: * Sometimes valid entries in hash:* types of sets were evicted due to a typo in an index. The wrong evictions happen when entries are deleted from the set and the bucket is shrinked. Bug was reported by Eric Ewanco and the patch fixes netfilter bugzilla id #1119. * Fixing of a null pointer exception when someone wants to add an entry to an empty list type of set and specifies an add before/after option. The fix is from Vishwanath Pai. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-21netfilter: nfnetlink: remove static declaration from err_listLiping Zhang
Otherwise, different subsys will race to access the err_list, with holding the different nfnl_lock(subsys_id). But this will not happen now, since ->call_batch is only implemented by nftables, so the err_list is protected by nfnl_lock(NFNL_SUBSYS_NFTABLES). Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-20nfsd: merge stable fix into main nfsd branchJ. Bruce Fields
2017-02-20Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Implement wraparound-safe refcount_t and kref_t types based on generic atomic primitives (Peter Zijlstra) - Improve and fix the ww_mutex code (Nicolai Hähnle) - Add self-tests to the ww_mutex code (Chris Wilson) - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr Bueso) - Micro-optimize the current-task logic all around the core kernel (Davidlohr Bueso) - Tidy up after recent optimizations: remove stale code and APIs, clean up the code (Waiman Long) - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) fork: Fix task_struct alignment locking/spinlock/debug: Remove spinlock lockup detection code lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS lkdtm: Convert to refcount_t testing kref: Implement 'struct kref' using refcount_t refcount_t: Introduce a special purpose refcount type sched/wake_q: Clarify queue reinit comment sched/wait, rcuwait: Fix typo in comment locking/mutex: Fix lockdep_assert_held() fail locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock() locking/rwsem: Reinit wake_q after use locking/rwsem: Remove unnecessary atomic_long_t casts jump_labels: Move header guard #endif down where it belongs locking/atomic, kref: Implement kref_put_lock() locking/ww_mutex: Turn off __must_check for now locking/atomic, kref: Avoid more abuse locking/atomic, kref: Use kref_get_unless_zero() more locking/atomic, kref: Kill kref_sub() locking/atomic, kref: Add kref_read() locking/atomic, kref: Add KREF_INIT() ...
2017-02-20Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-02-19 Here's a set of Bluetooth patches for the 4.11 kernel: - New USB IDs to the btusb driver - Race fix in btmrvl driver - Added out-of-band wakeup support to the btusb driver - NULL dereference fix to bt_sock_recvmsg Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20net: mpls: Add support for netconfDavid Ahern
Add netconf support to MPLS. Allows userpsace to learn and be notified of changes to 'input' enable setting per interface. Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20sctp: add support for MSG_MOREXin Long
This patch is to add support for MSG_MORE on sctp. It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after creating datamsg according to the send flag. sctp_packet_can_append_data then uses it to decide if the chunks of this msg will be sent at once or delay it. Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of in assoc. As sctp enqueues the chunks first, then dequeue them one by one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may affect other chunks' bundling. Since last patch, sctp flush out queue once assoc state falls into SHUTDOWN_PENDING, the close block problem mentioned in [1] has been solved as well. [1] https://patchwork.ozlabs.org/patch/372404/ Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20sctp: flush out queue once assoc state falls into SHUTDOWN_PENDINGXin Long
This patch is to flush out queue when assoc state falls into SHUTDOWN_PENDING if there are still chunks in it, so that the data can be sent out as soon as possible before sending SHUTDOWN chunk. When sctp supports MSG_MORE flag in next patch, this improvement can also solve the problem that the chunks with MSG_MORE flag may be stuck in queue when closing an assoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20openvswitch: Set event bit after initializing labels.Jarno Rajahalme
Connlabels are included in conntrack netlink event messages only if the IPCT_LABEL bit is set in the event cache (see ctnetlink_conntrack_event()). Set it after initializing labels for a new connection. Found upon further system testing, where it was noticed that labels were missing from the conntrack events. Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed connections.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20libceph: pass reply buffer length through ceph_osdc_call()Ilya Dryomov
To spare checking for "this reply fits into a page, but does it fit into my buffer?" in some callers, osd_req_op_cls_response_data_pages() needs to know how big it is. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20libceph: don't go through with the mapping if the PG is too wideIlya Dryomov
With EC overwrites maturing, the kernel client will be getting exposed to potentially very wide EC pools. While "min(pi->size, X)" works fine when the cluster is stable and happy, truncating OSD sets interferes with resend logic (ceph_is_new_interval(), etc). Abort the mapping if the pool is too wide, assigning the request to the homeless session. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-20crush: merge working data and scratchIlya Dryomov
Much like Arlo Guthrie, I decided that one big pile is better than two little piles. Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-02-20crush: remove mutable part of CRUSH mapIlya Dryomov
Then add it to the working state. It would be very nice if we didn't have to take a lock to calculate a crush placement. By moving the permutation array into the working data, we can treat the CRUSH map as immutable. Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-02-20libceph: add osdmap_set_crush() helperIlya Dryomov
Simplify osdmap_decode() and osdmap_apply_incremental() a bit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-02-20libceph: remove unneeded stddef.h includeStafford Horne
This was causing a build failure for openrisc when using musl and gcc 5.4.0 since the file is not available in the toolchain. It doesnt seem this is needed and removing it does not cause any build warnings for me. Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-02-20ceph: update readpages osd request according to size of pagesYan, Zheng
add_to_page_cache_lru() can fails, so the actual pages to read can be smaller than the initial size of osd request. We need to update osd request size in that case. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2017-02-20libceph: include linux/sched.h into crypto.c directlyIlya Dryomov
Currently crypto.c gets linux/sched.h indirectly through linux/slab.h from linux/kasan.h. Include it directly for memalloc_noio_*() inlines. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-02-19sctp: check duplicate node before inserting a new transportXin Long
sctp has changed to use rhlist for transport rhashtable since commit 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable"). But rhltable_insert_key doesn't check the duplicate node when inserting a node, unlike rhashtable_lookup_insert_key. It may cause duplicate assoc/transport in rhashtable. like: client (addr A, B) server (addr X, Y) connect to X INIT (1) ------------> connect to Y INIT (2) ------------> INIT_ACK (1) <------------ INIT_ACK (2) <------------ After sending INIT (2), one transport will be created and hashed into rhashtable. But when receiving INIT_ACK (1) and processing the address params, another transport will be created and hashed into rhashtable with the same addr Y and EP as the last transport. This will confuse the assoc/transport's lookup. This patch is to fix it by returning err if any duplicate node exists before inserting it. Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable") Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: add reconf chunk eventXin Long
This patch is to add reconf chunk event based on the sctp event frame in rx path, it will call sctp_sf_do_reconf to process the reconf chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: add reconf chunk processXin Long
This patch is to add a function to process the incoming reconf chunk, in which it verifies the chunk, and traverses the param and process it with the right function one by one. sctp_sf_do_reconf would be the process function of reconf chunk event. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: add a function to verify the sctp reconf chunkXin Long
This patch is to add a function sctp_verify_reconf to do some length check and multi-params check for sctp stream reconf according to rfc6525 section 3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: implement receiver-side procedures for the Incoming SSN Reset Request ↵Xin Long
Parameter This patch is to implement Receiver-Side Procedures for the Incoming SSN Reset Request Parameter described in rfc6525 section 5.2.3. It's also to move str_list endian conversion out of sctp_make_strreset_req, so that sctp_make_strreset_req can be used more conveniently to process inreq. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: implement receiver-side procedures for the Outgoing SSN Reset Request ↵Xin Long
Parameter This patch is to implement Receiver-Side Procedures for the Outgoing SSN Reset Request Parameter described in rfc6525 section 5.2.2. Note that some checks must be after request_seq check, as even those checks fail, strreset_inseq still has to be increase by 1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: add support for generating stream ssn reset event notificationXin Long
This patch is to add Stream Reset Event described in rfc6525 section 6.1.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19sctp: add support for generating stream reconf resp chunkXin Long
This patch is to define Re-configuration Response Parameter described in rfc6525 section 4.4. As optional fields are only for SSN/TSN Reset Request Parameter, it uses another function to make that. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19netfilter: xt_hashlimit: Fix integer divide round to zero.Alban Browaeys
Diving the divider by the multiplier before applying to the input. When this would "divide by zero", divide the multiplier by the divider first then multiply the input by this value. Currently user2creds outputs zero when input value is bigger than the number of slices and lower than scale. This as then user input is applied an integer divide operation to a number greater than itself (scale). That rounds up to zero, then we multiply zero by the credits slice size. iptables -t filter -I INPUT --protocol tcp --match hashlimit --hashlimit 40/second --hashlimit-burst 20 --hashlimit-mode srcip --hashlimit-name syn-flood --jump RETURN thus trigger the overflow detection code: xt_hashlimit: overflow, try lower: 25000/20 (25000 as hashlimit avg and 20 the burst) Here: 134217 slices of (HZ * CREDITS_PER_JIFFY) size. 500000 is user input value 1000000 is XT_HASHLIMIT_SCALE_v2 gives: 0 as user2creds output Setting burst to "1" typically solve the issue ... but setting it to "40" does too ! This is on 32bit arch calling into revision 2 of hashlimit. Signed-off-by: Alban Browaeys <alban.browaeys@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-19netfilter: ipset: Null pointer exception in ipset list:setVishwanath Pai
If we use before/after to add an element to an empty list it will cause a kernel panic. $> cat crash.restore create a hash:ip create b hash:ip create test list:set timeout 5 size 4 add test b before a $> ipset -R < crash.restore Executing the above will crash the kernel. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2017-02-19Fix bug: sometimes valid entries in hash:* types of sets were evictedJozsef Kadlecsik
Wrong index was used and therefore when shrinking a hash bucket at deleting an entry, valid entries could be evicted as well. Thanks to Eric Ewanco for the thorough bugreport. Fixes netfilter bugzilla #1119 Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2017-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-02-18ipv6: release dst on error in ip6_dst_lookup_tailWillem de Bruijn
If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped check, release the dst before returning an error. Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17irda: Fix lockdep annotations in hashbin_delete().David S. Miller
A nested lock depth was added to the hasbin_delete() code but it doesn't actually work some well and results in tons of lockdep splats. Fix the code instead to properly drop the lock around the operation and just keep peeking the head of the hashbin queue. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17tcp: use page_ref_inc() in tcp_sendmsg()Eric Dumazet
sk_page_frag_refill() allocates either a compound page or an order-0 page. We can use page_ref_inc() which is slightly faster than get_page() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17tcp: accommodate sequence number to a peer's shrunk receive window caused by ↵Cui, Cheng
precision loss in window scaling Prevent sending out a left-shifted sequence number from a Linux sender in response to a peer's shrunk receive-window caused by losing least significant bits in window-scaling. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Cheng Cui <Cheng.Cui@netapp.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17rds:Remove unnecessary ib_ring unallocZhu Yanjun
In the function rds_ib_xmit_atomic, ib_ring is not allocated successfully. As such, it is not necessary to unalloc it. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17pkt_sched: Remove useless qdisc_stab_lockGao Feng
The qdisc_stab_lock is used in qdisc_get_stab and qdisc_put_stab. These two functions are invoked in qdisc_create, qdisc_change, and qdisc_destroy which run fully under RTNL. So it already makes sure only one could access the qdisc_stab_list at the same time. Then it is unnecessary to use qdisc_stab_lock now. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17rxrpc: Change module filename to rxrpc.koDavid Howells
Change module filename from af-rxrpc.ko to rxrpc.ko so as to be consistent with the other protocol drivers. Also adjust the documentation to reflect this. Further, there is no longer a standalone rxkad module, as it has been merged into the rxrpc core, so get rid of references to that. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17rtnl: don't account unused struct ifla_port_vsi in rtnl_port_sizeDaniel Borkmann
When allocating rtnl dump messages, struct ifla_port_vsi is never dumped, so we can save header plus payload in rtnl_port_size(). Infact, attribute IFLA_PORT_VSI_TYPE and struct ifla_port_vsi are not used anywhere in the kernel. We only need to keep the nla policy should applications in user space be filling this out. Same NLA_BINARY issue exists as was fixed in 364d5716a7ad ("rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY") and others, but then again IFLA_PORT_VSI_TYPE is not used anywhere, so just add a comment that it's unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>