summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-10-22soreuseport: fix initialization raceCraig Gallek
Syzkaller stumbled upon a way to trigger WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41 reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39 There are two initialization paths for the sock_reuseport structure in a socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through SO_ATTACH_REUSEPORT_[CE]BPF before bind. The existing implementation assumedthat the socket lock protected both of these paths when it actually only protects the SO_ATTACH_REUSEPORT path. Syzkaller triggered this double allocation by running these paths concurrently. This patch moves the check for double allocation into the reuseport_alloc function which is protected by a global spin lock. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection") Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: rose: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22openvswitch: conntrack: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I placed a "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: netrom: nr_in: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: bridge: fix returning of vlan range op errorsNikolay Aleksandrov
When vlan tunnels were introduced, vlan range errors got silently dropped and instead 0 was returned always. Restore the previous behaviour and return errors to user-space. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22sock: correct sk_wmem_queued accounting on efault in tcp zerocopyWillem de Bruijn
Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues after triggering an EFAULT in __zerocopy_sg_from_iter. On this error, skb_zerocopy_stream_iter resets the skb to its state before the operation with __pskb_trim. It cannot kfree_skb like datagram callers, as the skb may have data from a previous send call. __pskb_trim calls skb_condense for unowned skbs, which adjusts their truesize. These tcp skbuffs are owned and their truesize must add up to sk_wmem_queued. But they match because their skb->sk is NULL until tcp_transmit_skb. Temporarily set skb->sk when calling __pskb_trim to signal that the skbuffs are owned and avoid the skb_condense path. Fixes: 52267790ef52 ("sock: add MSG_ZEROCOPY") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21tipc: fix broken tipc_poll() functionJon Maloy
In commit ae236fb208a6 ("tipc: receive group membership events via member socket") we broke the tipc_poll() function by checking the state of the receive queue before the call to poll_sock_wait(), while relying that state afterwards, when it might have changed. We restore this in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS*Jiri Pirko
All drivers are converted to use block callbacks for TC_SETUP_CLS*. So it is now safe to remove the calls to ndo_setup_tc from cls_* Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21dsa: Convert ndo_setup_tc offloads to block callbacksJiri Pirko
Benefit from the newly introduced block callback infrastructure and convert ndo_setup_tc calls for matchall offloads to block callbacks. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: cls_bpf: call block callbacks for offloadJiri Pirko
Use the newly introduced callbacks infrastructure and call block callbacks alongside with the existing per-netdev ndo_setup_tc. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: cls_u32: call block callbacks for offloadJiri Pirko
Use the newly introduced callbacks infrastructure and call block callbacks alongside with the existing per-netdev ndo_setup_tc. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnodeJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: cls_matchall: call block callbacks for offloadJiri Pirko
Use the newly introduced callbacks infrastructure and call block callbacks alongside with the existing per-netdev ndo_setup_tc. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: use tc_setup_cb_call to call per-block callbacksJiri Pirko
Extend the tc_setup_cb_call entrypoint function originally used only for action egress devices callbacks to call per-block callbacks as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: introduce per-block callbacksJiri Pirko
Introduce infrastructure that allows drivers to register callbacks that are called whenever tc would offload inserted rule for a specific block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: use extended variants of block_get/put in ingress and clsact qdiscsJiri Pirko
Use previously introduced extended variants of block get and put functions. This allows to specify a binder types specific to clsact ingress/egress which is useful for drivers to distinguish who actually got the block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net: sched: add block bind/unbind notif. and extended block_get/putJiri Pirko
Introduce new type of ndo_setup_tc message to propage binding/unbinding of a block to driver. Call this ndo whenever qdisc gets/puts a block. Alongside with this, there's need to propagate binder type from qdisc code down to the notifier. So introduce extended variants of block_get/put in order to pass this info. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21udp: make some messages more descriptiveMatteo Croce
In the UDP code there are two leftover error messages with very few meaning. Replace them with a more descriptive error message as some users reported them as "strange network error". Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21Merge tag 'linux-can-fixes-for-4.14-20171019' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-10-19 this is a pull request of 11 patches for the upcoming 4.14 release. There are 6 patches by ZHU Yi for the flexcan driver, that work around the CAN error handling state transition problems found in various incarnations of the flexcan IP core. The patch by Colin Ian King fixes a potential NULL pointer deref in the CAN broad cast manager (bcm). One patch by me replaces a direct deref of a RCU protected pointer by rcu_access_pointer. My second patch adds missing OOM error handling in af_can. A patch by Stefan Mätje for the esd_usb2 driver fixes the dlc in received RTR frames. And the last patch is by Wolfgang Grandegger, it fixes a busy loop in the gs_usb driver in case it runs out of TX contexts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21ipv6: let trace_fib6_table_lookup() dereference the fib tablePaolo Abeni
The perf traces for ipv6 routing code show a relevant cost around trace_fib6_table_lookup(), even if no trace is enabled. This is due to the fib6_table de-referencing currently performed by the caller. Let's the tracing code pay this overhead, passing to the trace helper the table pointer. This gives small but measurable performance improvement under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-10-19 Here's the first bluetooth-next pull request targeting the 4.15 kernel release. - Multiple fixes & improvements to the hci_bcm driver - DT improvements, e.g. new local-bd-address property - Fixes & improvements to ECDH usage. Private key is now generated by the crypto subsystem. - gcc-4.9 warning fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21hv_sock: add locking in the open/close/release code pathsDexuan Cui
Without the patch, when hvs_open_connection() hasn't completely established a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't inserted the sock into the connected queue), vsock_stream_connect() may see the sk_state change and return the connection to the userspace, and next when the userspace closes the connection quickly, hvs_release() may not see the connection in the connected queue; finally hvs_open_connection() inserts the connection into the queue, but we won't be able to purge the connection for ever. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Cc: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net/ncsi: Fix length of GVI response packetGavin Shan
The length of GVI (GetVersionInfo) response packet should be 40 instead of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats. # ethtool --ncsi eth0 swstats : RESPONSE OK TIMEOUT ERROR ======================================= GVI 0 0 2 With this applied, no error reported on GVI response packets: # ethtool --ncsi eth0 swstats : RESPONSE OK TIMEOUT ERROR ======================================= GVI 2 0 0 Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net/ncsi: Enforce failover on link monitor timeoutGavin Shan
The NCSI channel has been configured to provide service if its link monitor timer is enabled, regardless of its state (inactive or active). So the timeout event on the link monitor indicates the out-of-service on that channel, for which a failover is needed. This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor timeout, regardless the channel's original state (inactive or active). Also, the link is put into "down" state to give the failing channel lowest priority when selecting for the active channel. The state of failing channel should be set to active in order for deinitialization and failover to be done. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net/ncsi: Disable HWA mode when no channels are foundGavin Shan
When there are no NCSI channels probed, HWA (Hardware Arbitration) mode is enabled. It's not correct because HWA depends on the fact: NCSI channels exist and all of them support HWA mode. This disables HWA when no channels are probed. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net/ncsi: Stop monitor if channel times out or is inactiveSamuel Mendoza-Jonas
ncsi_channel_monitor() misses stopping the channel monitor in several places that it should, causing a WARN_ON_ONCE() to trigger when the monitor is re-started later, eg: [ 459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90 [ 459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140 [ 459.040000] Hardware name: ASpeed SoC [ 459.040000] Workqueue: events ncsi_dev_work [ 459.040000] [<80010094>] (unwind_backtrace) from [<8000d950>] (show_stack+0x20/0x24) [ 459.040000] [<8000d950>] (show_stack) from [<801dbf70>] (dump_stack+0x20/0x28) [ 459.040000] [<801dbf70>] (dump_stack) from [<80018d7c>] (__warn+0xe0/0x108) [ 459.040000] [<80018d7c>] (__warn) from [<80018e70>] (warn_slowpath_null+0x30/0x38) [ 459.040000] [<80018e70>] (warn_slowpath_null) from [<803f6a08>] (ncsi_start_channel_monitor+0x7c/0x90) [ 459.040000] [<803f6a08>] (ncsi_start_channel_monitor) from [<803f7664>] (ncsi_configure_channel+0xdc/0x5fc) [ 459.040000] [<803f7664>] (ncsi_configure_channel) from [<803f8160>] (ncsi_dev_work+0xac/0x474) [ 459.040000] [<803f8160>] (ncsi_dev_work) from [<8002d244>] (process_one_work+0x1e0/0x450) [ 459.040000] [<8002d244>] (process_one_work) from [<8002d510>] (worker_thread+0x5c/0x570) [ 459.040000] [<8002d510>] (worker_thread) from [<80033614>] (kthread+0x124/0x164) [ 459.040000] [<80033614>] (kthread) from [<8000a5e8>] (ret_from_fork+0x14/0x2c) This also updates the monitor instead of just returning if ncsi_xmit_cmd() fails to send the get-link-status command so that the monitor properly times out. Fixes: e6f44ed6d04d3 "net/ncsi: Package and channel management" Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21net/ncsi: Fix AEN HNCDSC packet lengthSamuel Mendoza-Jonas
Correct the value of the HNCDSC AEN packet. Fixes: 7a82ecf4cfb85 "net/ncsi: NCSI AEN packet handler" Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21ipv4: ipv4_default_advmss() should use route mtuEric Dumazet
ipv4_default_advmss() incorrectly uses the device MTU instead of the route provided one. IPv6 has the proper behavior, lets harmonize the two protocols. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21packet: avoid panic in packet_getsockopt()Eric Dumazet
syzkaller got crashes in packet_getsockopt() processing PACKET_ROLLOVER_STATS command while another thread was managing to change po->rollover Using RCU will fix this bug. We might later add proper RCU annotations for sparse sake. In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu() variant, as spotted by John. Fixes: a9b6391814d5 ("packet: rollover statistics") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: John Sperbeck <jsperbeck@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21Merge branch 'ieee802154-for-davem-2017-10-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154 2017-10-18 Please find below a pull request from the ieee802154 subsystem for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21tcp: fix tcp_send_syn_data()Eric Dumazet
syn_data was allocated by sk_stream_alloc_skb(), meaning its destructor and _skb_refdst fields are mangled. We need to call tcp_skb_tsorted_anchor_cleanup() before calling kfree_skb() or kernel crashes. Bug was reported by syzkaller bot. Fixes: e2080072ed2d ("tcp: new list for sent but unacked skbs for RACK recovery") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21ipv6: remove from fib tree aged out RTF_CACHE dstPaolo Abeni
The commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") partially reverted the commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route has expired"). As a result, RTF_CACHE dst referenced outside the fib tree will not be removed until the next sernum change; dst_check() does not fail on aged-out dst, and dst->__refcnt can't decrease: the aged out dst will stay valid for a potentially unlimited time after the timeout expiration. This change explicitly removes RTF_CACHE dst from the fib tree when aged out. The rt6_remove_exception() logic will then obsolete the dst and other entities will drop the related reference on next dst_check(). pMTU exceptions are not aged-out, and are removed from the exception table only when the - usually considerably longer - ip6_rt_mtu_expires timeout expires. v1 -> v2: - do not touch dst.obsolete in rt6_remove_exception(), not needed v2 -> v3: - take care of pMTU exceptions, too Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21ipv6: start fib6 gc on RTF_CACHE dst creationPaolo Abeni
After the commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), the fib6 gc is not started after the creation of a RTF_CACHE via a redirect or pmtu update, since fib6_add() isn't invoked anymore for such dsts. We need the fib6 gc to run periodically to clean the RTF_CACHE, or the dst will stay there forever. Fix it by explicitly calling fib6_force_start_gc() on successful exception creation. gc_args->more accounting will ensure that the gc timer will run for whatever time needed to properly clean the table. v2 -> v3: - clarified the commit message Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21tcp/dccp: fix ireq->opt racesEric Dumazet
syzkaller found another bug in DCCP/TCP stacks [1] For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access ireq->opt unless we own the request sock. Note the opt field is renamed to ireq_opt to ease grep games. [1] BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295 CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline] tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x40c341 RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341 RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1 R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000 Allocated by task 3295: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 __do_kmalloc mm/slab.c:3725 [inline] __kmalloc+0x162/0x760 mm/slab.c:3734 kmalloc include/linux/slab.h:498 [inline] tcp_v4_save_options include/net/tcp.h:1962 [inline] tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3306: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157 __sk_destruct+0xfd/0x910 net/core/sock.c:1560 sk_destruct+0x47/0x80 net/core/sock.c:1595 __sk_free+0x57/0x230 net/core/sock.c:1603 sk_free+0x2a/0x40 net/core/sock.c:1614 sock_put include/net/sock.h:1652 [inline] inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959 tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765 tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20bpf: Add file mode configuration into bpf mapsChenbo Feng
Introduce the map read/write flags to the eBPF syscalls that returns the map fd. The flags is used to set up the file mode when construct a new file descriptor for bpf maps. To not break the backward capability, the f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise it should be O_RDONLY or O_WRONLY. When the userspace want to modify or read the map content, it will check the file mode to see if it is allowed to make the change. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net: ipv4: Change fib notifiers to take a fib_aliasDavid Ahern
All of the notifier data (fib_info, tos, type and table id) are contained in the fib_alias. Pass it to the notifier instead of each data separately shortening the argument list by 3. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20tcp: socket option to set TCP fast open keyYuchung Cheng
New socket option TCP_FASTOPEN_KEY to allow different keys per listener. The listener by default uses the global key until the socket option is set. The key is a 16 bytes long binary data. This option has no effect on regular non-listener TCP sockets. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net: Add extack to validator_info structs used for address notifierDavid Ahern
Add extack to in_validator_info and in6_validator_info. Update the one user of each, ipvlan, to return an error message for failures. Only manual configuration of an address is plumbed in the IPv6 code path. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net: ipv6: Make inet6addr_validator a blocking notifierDavid Ahern
inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan should return an error when an address is already in use") to allow address validation before changes are committed and to be able to fail the address change with an error back to the user. The address validation is not done for addresses received from router advertisements. Handling RAs in softirq context is the only reason for the notifier chain to be atomic versus blocking. Since the only current user, ipvlan, of the validator chain ignores softirq context, the notifier can be made blocking and simply not invoked for softirq path. The blocking option is needed by spectrum for example to validate resources for an adding an address to an interface. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20ipv6: addrconf: cleanup locking in ipv6_add_addrDavid Ahern
ipv6_add_addr is called in process context with rtnl lock held (e.g., manual config of an address) or during softirq processing (e.g., autoconf and address from a router advertisement). Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry and does not call unlock until exit, minus the call around the address validator notifier. Similarly, addrconf_hash_lock is taken after the validator notifier and held until exit. This forces the allocation of inet6_ifaddr to always be atomic. Refactor ipv6_add_addr as follows: 1. add an input boolean to discriminate the call path (process context or softirq). This new flag controls whether the alloc can be done with GFP_KERNEL or GFP_ATOMIC. 2. Move the rcu_read_lock_bh and unlock calls only around functions that do rcu updates. 3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should return an error when an address is already in use."). This was done presumably because rcu_read_unlock_bh needs to be called before calling the validator. Since rcu_read_lock is not needed before the validator runs revert the hold and put added by 3ad7d2468f79f and only do the hold when setting ifp->idev. 4. move duplicate address check and insertion of new address in the global address hash into a helper. The helper is called after an ifa is allocated and filled in. This allows the ifa for manually configured addresses to be done with GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held and hash table spinlock held. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net/sched: Set the net-device for egress device instanceOr Gerlitz
Currently the netdevice field is not set and the egdev instance is not functional, fix that. Fixes: 3f55bdda8df ('net: sched: introduce per-egress action device callbacks') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20bpf: remove mark access for SK_SKB program typesJohn Fastabend
The skb->mark field is a union with reserved_tailroom which is used in the TCP code paths from stream memory allocation. Allowing SK_SKB programs to set this field creates a conflict with future code optimizations, such as "gifting" the skb to the egress path instead of creating a new skb and doing a memcpy. Because we do not have a released version of SK_SKB yet lets just remove it for now. A more appropriate scratch pad to use at the socket layer is dev_scratch, but lets add that in future kernels when needed. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb regionJohn Fastabend
SK_SKB BPF programs are run from the socket/tcp context but early in the stack before much of the TCP metadata is needed in tcp_skb_cb. So we can use some unused fields to place BPF metadata needed for SK_SKB programs when implementing the redirect function. This allows us to drop the preempt disable logic. It does however require an API change so sk_redirect_map() has been updated to additionally provide ctx_ptr to skb. Note, we do however continue to disable/enable preemption around actual BPF program running to account for map updates. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20sctp: add the missing sock_owned_by_user check in sctp_icmp_redirectXin Long
Now sctp processes icmp redirect packet in sctp_icmp_redirect where it calls sctp_transport_dst_check in which tp->dst can be released. The problem is before calling sctp_transport_dst_check, it doesn't check sock_owned_by_user, which means tp->dst could be freed while a process is accessing it with owning the socket. An use-after-free issue could be triggered by this. This patch is to fix it by checking sock_owned_by_user before calling sctp_transport_dst_check in sctp_icmp_redirect, so that it would not release tp->dst if users still hold sock lock. Besides, the same issue fixed in commit 45caeaa5ac0b ("dccp/tcp: fix routing redirect race") on sctp also needs this check. Fixes: 55be7a9c6074 ("ipv4: Add redirect support to all protocol icmp error handlers") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20Merge tag 'rxrpc-next-20171018' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Add bits for kernel services Here are some patches that add a few things for kernel services to use: (1) Allow service upgrade to be requested and allow the resultant actual service ID to be obtained. (2) Allow the RTT time of a call to be obtained. (3) Allow a kernel service to find out if a call is still alive on a server between transmitting a request and getting the reply. (4) Allow data transmission to ignore signals if transmission progress is being made in reasonable time. This is also usable by userspace by passing MSG_WAITALL to sendmsg()[*]. [*] I'm not sure this is the right interface for this or whether a sockopt should be used instead. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net: sched: cls_u32: use hash_ptr() for tc_u_hashArnd Bergmann
After the change to the tp hash, we now get a build warning on 32-bit architectures: net/sched/cls_u32.c: In function 'tc_u_hash': net/sched/cls_u32.c:338:17: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] return hash_64((u64) tp->chain->block, U32_HASH_SHIFT); Using hash_ptr() instead of hash_64() lets us drop the cast and fixes the warning while still resulting in the same hash value. Fixes: 7fa9d974f3c2 ("net: sched: cls_u32: use block instead of q in tc_u_common") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20tipc: checking for NULL instead of IS_ERR()Dan Carpenter
The tipc_alloc_conn() function never returns NULL, it returns error pointers, so I have fixed the check. Fixes: 14c04493cb77 ("tipc: add ability to order and receive topology events in driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19SUNRPC: Destroy transport from the system workqueueTrond Myklebust
The transport may need to flush transport connect and receive tasks that are running on rpciod. In order to do so safely, we need to ensure that the caller of cancel_work_sync() etc is not itself running on rpciod. Do so by running the destroy task from the system workqueue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-10-19net: l2tp: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I replaced the "NOBREAK" comment with a "fall through" comment, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19sctp: do not peel off an assoc from one netns to another oneXin Long
Now when peeling off an association to the sock in another netns, all transports in this assoc are not to be rehashed and keep use the old key in hashtable. As a transport uses sk->net as the hash key to insert into hashtable, it would miss removing these transports from hashtable due to the new netns when closing the sock and all transports are being freeed, then later an use-after-free issue could be caused when looking up an asoc and dereferencing those transports. This is a very old issue since very beginning, ChunYu found it with syzkaller fuzz testing with this series: socket$inet6_sctp() bind$inet6() sendto$inet6() unshare(0x40000000) getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST() getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF() This patch is to block this call when peeling one assoc off from one netns to another one, so that the netns of all transport would not go out-sync with the key in hashtable. Note that this patch didn't fix it by rehashing transports, as it's difficult to handle the situation when the tuple is already in use in the new netns. Besides, no one would like to peel off one assoc to another netns, considering ipaddrs, ifaces, etc. are usually different. Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>