summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-08-30Merge tag 'rxrpc-fixes-20190827' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix use of skb_cow_data() Here's a series of patches that replaces the use of skb_cow_data() in rxrpc with skb_unshare() early on in the input process. The problem that is being seen is that skb_cow_data() indirectly requires that the maximum usage count on an sk_buff be 1, and it may generate an assertion failure in pskb_expand_head() if not. This can occur because rxrpc_input_data() may be still holding a ref when it has just attached the sk_buff to the rx ring and given that attachment its own ref. If recvmsg happens fast enough, skb_cow_data() can see the ref still held by the softirq handler. Further, a packet may contain multiple subpackets, each of which gets its own attachment to the ring and its own ref - also making skb_cow_data() go bang. Fix this by: (1) The DATA packet is currently parsed for subpackets twice by the input routines. Parse it just once instead and make notes in the sk_buff private data. (2) Use the notes from (1) when attaching the packet to the ring multiple times. Once the packet is attached to the ring, recvmsg can see it and start modifying it, so the softirq handler is not permitted to look inside it from that point. (3) Pass the ref from the input code to the ring rather than getting an extra ref. rxrpc_input_data() uses a ref on the second refcount to prevent the packet from evaporating under it. (4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet() before we take call->input_lock. Other sorts of packets don't get modified and so can be left. A trace is emitted if skb_unshare() eats the skb. Note that skb_share() for our accounting in this regard as we can't see the parameters in the packet to log in a trace line if it releases it. (5) Remove the calls to skb_cow_data(). These are then no longer necessary. There are also patches to improve the rxrpc_skb tracepoint to make sure that Tx-derived buffers are identified separately from Rx-derived buffers in the trace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30Merge tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull two ceph fixes from Ilya Dryomov: "A fix for a -rc1 regression in rbd and a trivial static checker fix" * tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-client: rbd: restore zeroing past the overlap when reading from parent libceph: don't call crypto_free_sync_skcipher() on a NULL tfm
2019-08-30cfg80211: add local BSS receive time to survey informationFelix Fietkau
This is useful for checking how much airtime is being used up by other transmissions on the channel, e.g. by calculating (time_rx - time_bss_rx) or (time_busy - time_bss_rx - time_tx) Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20190828102042.58016-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-30netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorderwenxu
Get the vlan_proto of ingress bridge in network byteorder as userspace expects. Otherwise this is inconsistent with NFT_META_PROTOCOL. Fixes: 2a3a93ef0ba5 ("netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-29mac80211: Correctly set noencrypt for PAE framesDenis Kenzior
The noencrypt flag was intended to be set if the "frame was received unencrypted" according to include/uapi/linux/nl80211.h. However, the current behavior is opposite of this. Cc: stable@vger.kernel.org Fixes: 018f6fbf540d ("mac80211: Send control port frames over nl80211") Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190827224120.14545-3-denkenz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-29mac80211: Don't memset RXCB prior to PAE interceptDenis Kenzior
In ieee80211_deliver_skb_to_local_stack intercepts EAPoL frames if mac80211 is configured to do so and forwards the contents over nl80211. During this process some additional data is also forwarded, including whether the frame was received encrypted or not. Unfortunately just prior to the call to ieee80211_deliver_skb_to_local_stack, skb->cb is cleared, resulting in incorrect data being exposed over nl80211. Fixes: 018f6fbf540d ("mac80211: Send control port frames over nl80211") Cc: stable@vger.kernel.org Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190827224120.14545-2-denkenz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-29netfilter: nf_flow_table: clear skb tstamp before xmitFlorian Westphal
If 'fq' qdisc is used and a program has requested timestamps, skb->tstamp needs to be cleared, else fq will treat these as 'transmit time'. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-28net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueueDavide Caratti
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that per-cpu counters are there in the error path of skb_array_produce(). Otherwise, the following splat can be seen: Unable to handle kernel paging request at virtual address 0000600dea430008 Mem abort info: ESR = 0x96000005 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e [0000600dea430008] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP [...] pstate: 10000005 (nzcV daif -PAN -UAO) pc : pfifo_fast_enqueue+0x524/0x6e8 lr : pfifo_fast_enqueue+0x46c/0x6e8 sp : ffff800d39376fe0 x29: ffff800d39376fe0 x28: 1ffff001a07d1e40 x27: ffff800d03e8f188 x26: ffff800d03e8f200 x25: 0000000000000062 x24: ffff800d393772f0 x23: 0000000000000000 x22: 0000000000000403 x21: ffff800cca569a00 x20: ffff800d03e8ee00 x19: ffff800cca569a10 x18: 00000000000000bf x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffff1001a726edd0 x13: 1fffe4000276a9a4 x12: 0000000000000000 x11: dfff200000000000 x10: ffff800d03e8f1a0 x9 : 0000000000000003 x8 : 0000000000000000 x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003 x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb x1 : 0000600dea430000 x0 : 0000600dea430008 Process ping (pid: 6067, stack limit = 0x00000000dc0aa557) Call trace: pfifo_fast_enqueue+0x524/0x6e8 htb_enqueue+0x660/0x10e0 [sch_htb] __dev_queue_xmit+0x123c/0x2de0 dev_queue_xmit+0x24/0x30 ip_finish_output2+0xc48/0x1720 ip_finish_output+0x548/0x9d8 ip_output+0x334/0x788 ip_local_out+0x90/0x138 ip_send_skb+0x44/0x1d0 ip_push_pending_frames+0x5c/0x78 raw_sendmsg+0xed8/0x28d0 inet_sendmsg+0xc4/0x5c0 sock_sendmsg+0xac/0x108 __sys_sendto+0x1ac/0x2a0 __arm64_sys_sendto+0xc4/0x138 el0_svc_handler+0x13c/0x298 el0_svc+0x8/0xc Code: f9402e80 d538d081 91002000 8b010000 (885f7c03) Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'. Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni <pabeni@redhat.com> CC: Stefano Brivio <sbrivio@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28tcp: inherit timestamp on mtu probeWillem de Bruijn
TCP associates tx timestamp requests with a byte in the bytestream. If merging skbs in tcp_mtu_probe, migrate the tstamp request. Similar to MSG_EOR, do not allow moving a timestamp from any segment in the probe but the last. This to avoid merging multiple timestamps. Tested with the packetdrill script at https://github.com/wdebruij/packetdrill/commits/mtu_probe-1 Link: http://patchwork.ozlabs.org/patch/1143278/#2232897 Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28net: sched: act_sample: fix psample group handling on overwriteVlad Buslov
Action sample doesn't properly handle psample_group pointer in overwrite case. Following issues need to be fixed: - In tcf_sample_init() function RCU_INIT_POINTER() is used to set s->psample_group, even though we neither setting the pointer to NULL, nor preventing concurrent readers from accessing the pointer in some way. Use rcu_swap_protected() instead to safely reset the pointer. - Old value of s->psample_group is not released or deallocated in any way, which results resource leak. Use psample_group_put() on non-NULL value obtained with rcu_swap_protected(). - The function psample_group_put() that released reference to struct psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu grace period when deallocating it. Extend struct psample_group with rcu head and use kfree_rcu when freeing it. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28openvswitch: Clear the L4 portion of the key for "later" fragments.Justin Pettit
Only the first fragment in a datagram contains the L4 headers. When the Open vSwitch module parses a packet, it always sets the IP protocol field in the key, but can only set the L4 fields on the first fragment. The original behavior would not clear the L4 portion of the key, so garbage values would be sent in the key for "later" fragments. This patch clears the L4 fields in that circumstance to prevent sending those garbage values as part of the upcall. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28openvswitch: Properly set L4 keys on "later" IP fragmentsGreg Rose
When IP fragments are reassembled before being sent to conntrack, the key from the last fragment is used. Unless there are reordering issues, the last fragment received will not contain the L4 ports, so the key for the reassembled datagram won't contain them. This patch updates the key once we have a reassembled datagram. The handle_fragments() function works on L3 headers so we pull the L3/L4 flow key update code from key_extract into a new function 'key_extract_l3l4'. Then we add a another new function ovs_flow_key_update_l3l4() and export it so that it is accessible by handle_fragments() for conntrack packet reassembly. Co-authored-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28mld: fix memory leak in mld_del_delrec()Eric Dumazet
Similar to the fix done for IPv4 in commit e5b1c6c6277d ("igmp: fix memory leak in igmpv3_del_delrec()"), we need to make sure mca_tomb and mca_sources are not blindly overwritten. Using swap() then a call to ip6_mc_clear_src() will take care of the missing free. BUG: memory leak unreferenced object 0xffff888117d9db00 (size 64): comm "syz-executor247", pid 6918, jiffies 4294943989 (age 25.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 fe 88 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000005b463030>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<000000005b463030>] slab_post_alloc_hook mm/slab.h:522 [inline] [<000000005b463030>] slab_alloc mm/slab.c:3319 [inline] [<000000005b463030>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548 [<00000000939cbf94>] kmalloc include/linux/slab.h:552 [inline] [<00000000939cbf94>] kzalloc include/linux/slab.h:748 [inline] [<00000000939cbf94>] ip6_mc_add1_src net/ipv6/mcast.c:2236 [inline] [<00000000939cbf94>] ip6_mc_add_src+0x31f/0x420 net/ipv6/mcast.c:2356 [<00000000d8972221>] ip6_mc_source+0x4a8/0x600 net/ipv6/mcast.c:449 [<000000002b203d0d>] do_ipv6_setsockopt.isra.0+0x1b92/0x1dd0 net/ipv6/ipv6_sockglue.c:748 [<000000001f1e2d54>] ipv6_setsockopt+0x89/0xd0 net/ipv6/ipv6_sockglue.c:944 [<00000000c8f7bdf9>] udpv6_setsockopt+0x4e/0x90 net/ipv6/udp.c:1558 [<000000005a9a0c5e>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3139 [<00000000910b37b2>] __sys_setsockopt+0x10f/0x220 net/socket.c:2084 [<00000000e9108023>] __do_sys_setsockopt net/socket.c:2100 [inline] [<00000000e9108023>] __se_sys_setsockopt net/socket.c:2097 [inline] [<00000000e9108023>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2097 [<00000000f4818160>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296 [<000000008d367e8f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Fixes: 9c8bb163ae78 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28net/sched: pfifo_fast: fix wrong dereference when qdisc is resetDavide Caratti
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu counters are present when 'reset()' is called for pfifo_fast qdiscs. Otherwise, the following script: # tc q a dev lo handle 1: root htb default 100 # tc c a dev lo parent 1: classid 1:100 htb \ > rate 95Mbit ceil 100Mbit burst 64k [...] # tc f a dev lo parent 1: protocol arp basic classid 1:100 [...] # tc q a dev lo parent 1:100 handle 100: pfifo_fast [...] # tc q d dev lo root can generate the following splat: Unable to handle kernel paging request at virtual address dfff2c01bd148000 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfff2c01bd148000] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP [...] pstate: 80000005 (Nzcv daif -PAN -UAO) pc : pfifo_fast_reset+0x280/0x4d8 lr : pfifo_fast_reset+0x21c/0x4d8 sp : ffff800d09676fa0 x29: ffff800d09676fa0 x28: ffff200012ee22e4 x27: dfff200000000000 x26: 0000000000000000 x25: ffff800ca0799958 x24: ffff1001940f332b x23: 0000000000000007 x22: ffff200012ee1ab8 x21: 0000600de8a40000 x20: 0000000000000000 x19: ffff800ca0799900 x18: 0000000000000000 x17: 0000000000000002 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: ffff1001b922e6e2 x11: 1ffff001b922e6e1 x10: 0000000000000000 x9 : 1ffff001b922e6e1 x8 : dfff200000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 1fffe400025dc45c x4 : 1fffe400025dc357 x3 : 00000c01bd148000 x2 : 0000600de8a40000 x1 : 0000000000000007 x0 : 0000600de8a40004 Call trace: pfifo_fast_reset+0x280/0x4d8 qdisc_reset+0x6c/0x370 htb_reset+0x150/0x3b8 [sch_htb] qdisc_reset+0x6c/0x370 dev_deactivate_queue.constprop.5+0xe0/0x1a8 dev_deactivate_many+0xd8/0x908 dev_deactivate+0xe4/0x190 qdisc_graft+0x88c/0xbd0 tc_get_qdisc+0x418/0x8a8 rtnetlink_rcv_msg+0x3a8/0xa78 netlink_rcv_skb+0x18c/0x328 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x3c4/0x538 netlink_sendmsg+0x538/0x9a0 sock_sendmsg+0xac/0xf8 ___sys_sendmsg+0x53c/0x658 __sys_sendmsg+0xc8/0x140 __arm64_sys_sendmsg+0x74/0xa8 el0_svc_handler+0x164/0x468 el0_svc+0x10/0x14 Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863) Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'. Changes since v1: - coding style improvements, thanks to Stefano Brivio Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni <pabeni@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28libceph: don't call crypto_free_sync_skcipher() on a NULL tfmJia-Ju Bai
In set_secret(), key->tfm is assigned to NULL on line 55, and then ceph_crypto_key_destroy(key) is executed. ceph_crypto_key_destroy(key) crypto_free_sync_skcipher(key->tfm) crypto_free_skcipher(&tfm->base); This happens to work because crypto_sync_skcipher is a trivial wrapper around crypto_skcipher: &tfm->base is still 0 and crypto_free_skcipher() handles that. Let's not rely on the layout of crypto_sync_skcipher. This bug is found by a static analysis tool STCheck written by us. Fixes: 69d6302b65a8 ("libceph: Remove VLA usage of skcipher"). Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-27tcp: remove empty skb from write queue in error casesEric Dumazet
Vladimir Rutsky reported stuck TCP sessions after memory pressure events. Edge Trigger epoll() user would never receive an EPOLLOUT notification allowing them to retry a sendmsg(). Jason tested the case of sk_stream_alloc_skb() returning NULL, but there are other paths that could lead both sendmsg() and sendpage() to return -1 (EAGAIN), with an empty skb queued on the write queue. This patch makes sure we remove this empty skb so that Jason code can detect that the queue is empty, and call sk->sk_write_space(sk) accordingly. Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Reported-by: Vladimir Rutsky <rutsky@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net/rds: Fix info leak in rds6_inc_info_copy()Ka-Cheong Poon
The rds6_inc_info_copy() function has a couple struct members which are leaking stack information. The ->tos field should hold actual information and the ->flags field needs to be zeroed out. Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure") Fixes: b7ff8b1036f0 ("rds: Extend RDS API for IPv6 support") Reported-by: 黄ID蝴蝶 <butterflyhuangxx@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set ep ecn flag by sockoptXin Long
SCTP_ECN_SUPPORTED sockopt will be added to allow users to change ep ecn flag, and it's similar with other feature flags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set netns ecn flag with sysctlXin Long
sysctl net.sctp.ecn_enable is added in this patch. It will allow users to change the default sctp ecn flag, net.sctp.ecn_enable. This feature was also required on this thread: http://lkml.iu.edu/hypermail/linux/kernel/0812.1/01858.html Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: make ecn flag per netns and endpointXin Long
This patch is to add ecn flag for both netns_sctp and sctp_endpoint, net->sctp.ecn_enable is set 1 by default, and ep->ecn_enable will be initialized with net->sctp.ecn_enable. asoc->peer.ecn_capable will be set during negotiation only when ep->ecn_enable is set on both sides. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: fix skb use after free in netpollFeng Sun
After commit baeababb5b85d5c4e6c917efe2a1504179438d3b ("tun: return NET_XMIT_DROP for dropped packets"), when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP, netpoll_send_skb_on_dev will run into following use after free cases: 1. retry netpoll_start_xmit with freed skb; 2. queue freed skb in npinfo->txq. queue_process will also run into use after free case. hit netpoll_send_skb_on_dev first case with following kernel log: [ 117.864773] kernel BUG at mm/slub.c:306! [ 117.864773] invalid opcode: 0000 [#1] SMP PTI [ 117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P OE 5.3.0-050300rc5-generic #201908182231 [ 117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0 [ 117.864781] Call Trace: [ 117.864781] ? tun_net_xmit+0x21c/0x460 [ 117.864781] kfree_skbmem+0x4e/0x60 [ 117.864782] kfree_skb+0x3a/0xa0 [ 117.864782] tun_net_xmit+0x21c/0x460 [ 117.864782] netpoll_start_xmit+0x11d/0x1b0 [ 117.864788] netpoll_send_skb_on_dev+0x1b8/0x200 [ 117.864789] __br_forward+0x1b9/0x1e0 [bridge] [ 117.864789] ? skb_clone+0x53/0xd0 [ 117.864790] ? __skb_clone+0x2e/0x120 [ 117.864790] deliver_clone+0x37/0x50 [bridge] [ 117.864790] maybe_deliver+0x89/0xc0 [bridge] [ 117.864791] br_flood+0x6c/0x130 [bridge] [ 117.864791] br_dev_xmit+0x315/0x3c0 [bridge] [ 117.864792] netpoll_start_xmit+0x11d/0x1b0 [ 117.864792] netpoll_send_skb_on_dev+0x1b8/0x200 [ 117.864792] netpoll_send_udp+0x2c6/0x3e8 [ 117.864793] write_msg+0xd9/0xf0 [netconsole] [ 117.864793] console_unlock+0x386/0x4e0 [ 117.864793] vprintk_emit+0x17e/0x280 [ 117.864794] vprintk_default+0x29/0x50 [ 117.864794] vprintk_func+0x4c/0xbc [ 117.864794] printk+0x58/0x6f [ 117.864795] loop_fun+0x24/0x41 [printmsg_loop] [ 117.864795] kthread+0x104/0x140 [ 117.864795] ? 0xffffffffc05b1000 [ 117.864796] ? kthread_park+0x80/0x80 [ 117.864796] ret_from_fork+0x35/0x40 Signed-off-by: Feng Sun <loyou85@gmail.com> Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: Advertise the VLAN offload netdev ability only if switch supports itVladimir Oltean
When adding a VLAN sub-interface on a DSA slave port, the 8021q core checks NETIF_F_HW_VLAN_CTAG_FILTER and, if the netdev is capable of filtering, calls .ndo_vlan_rx_add_vid or .ndo_vlan_rx_kill_vid to configure the VLAN offloading. DSA sets this up counter-intuitively: it always advertises this netdev feature, but the underlying driver may not actually support VLAN table manipulation. In that case, the DSA core is forced to ignore the error, because not being able to offload the VLAN is still fine - and should result in the creation of a non-accelerated VLAN sub-interface. Change this so that the netdev feature is only advertised for switch drivers that support VLAN manipulation, instead of checking for -EOPNOTSUPP at runtime. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: tag_8021q: Future-proof the reserved fields in the custom VIDVladimir Oltean
After witnessing the discussion in https://lkml.org/lkml/2019/8/14/151 w.r.t. ioctl extensibility, it became clear that such an issue might prevent that the 3 RSV bits inside the DSA 802.1Q tag might also suffer the same fate and be useless for further extension. So clearly specify that the reserved bits should currently be transmitted as zero and ignored on receive. The DSA tagger already does this (and has always did), and is the only known user so far (no Wireshark dissection plugin, etc). So there should be no incompatibility to speak of. Fixes: 0471dd429cea ("net: dsa: tag_8021q: Create a stable binary format") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: clear VLAN PVID flag for CPU portVivien Didelot
When the bridge offloads a VLAN on a slave port, we also need to program its dedicated CPU port as a member of the VLAN. Drivers may handle the CPU port's membership as they want. For example, Marvell as a special "Unmodified" mode to pass frames as is through such ports. Even though DSA expects the drivers to handle the CPU port membership, it does not make sense to program user VLANs as PVID on the CPU port. This patch clears this flag before programming the CPU port. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: program VLAN on CPU port from slaveVivien Didelot
DSA currently programs a VLAN on the CPU port implicitly after the related notifier is received by a switch. While we still need to do this transparent programmation of the DSA links in the fabric, programming the CPU port this way may cause problems in some corners such as the tag_8021q driver. Because the dedicated CPU port is specific to a slave, make their programmation explicit a few layers up, in the slave code. Note that technically, DSA links have a dedicated CPU port as well, but since they are only used as conduit between interconnected switches of a fabric, programming them transparently this way is what we want. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: check bridge VLAN in slave operationsVivien Didelot
The bridge VLANs are not offloaded by dsa_port_vlan_* if the port is not bridged or if its bridge is not VLAN aware. This is a good thing but other corners of DSA, such as the tag_8021q driver, may need to program VLANs regardless the bridge state. And also because bridge_dev is specific to user ports anyway, move these checks were it belongs, one layer up in the slave code. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: add slave VLAN helpersVivien Didelot
Add dsa_slave_vlan_add and dsa_slave_vlan_del helpers to handle SWITCHDEV_OBJ_ID_PORT_VLAN switchdev objects. Also copy the switchdev_obj_port_vlan structure on add since we will modify it in future patches. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_addVivien Didelot
Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP. This function is used in the tag_8021q.c code to offload the PVID of ports, which would simply not work if .port_vlan_add is not supported by the underlying switch. Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary, that is to say in dsa_slave_vlan_rx_add_vid. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: remove bitmap operationsVivien Didelot
The bitmap operations were introduced to simplify the switch drivers in the future, since most of them could implement the common VLAN and MDB operations (add, del, dump) with simple functions taking all target ports at once, and thus limiting the number of hardware accesses. Programming an MDB or VLAN this way in a single operation would clearly simplify the drivers a lot but would require a new get-set interface in DSA. The usage of such bitmap from the stack also raised concerned in the past, leading to the dynamic allocation of a new ds->_bitmap member in the dsa_switch structure. So let's get rid of them for now. This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add} switch operations into new dsa_switch_{mdb,vlan}_{prepare,add} variants not using any bitmap argument anymore. New dsa_switch_{mdb,vlan}_match helpers have been introduced to make clear which local port of a switch must be programmed with the target object. While the targeted user port is an obvious candidate, the DSA links must also be programmed, as well as the CPU port for VLANs. While at it, also remove local variables that are only used once. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net_sched: fix a NULL pointer deref in ipt actionCong Wang
The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target(). The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by: itugrok@yahoo.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Merge tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - Fix a page lock leak in nfs_pageio_resend() - Ensure O_DIRECT reports an error if the bytes read/written is 0 - Don't handle errors if the bind/connect succeeded - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidat ed" Bugfixes: - Don't refresh attributes with mounted-on-file information - Fix return values for nfs4_file_open() and nfs_finish_open() - Fix pnfs layoutstats reporting of I/O errors - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort for soft I/O errors when the user specifies a hard mount. - Various fixes to the error handling in sunrpc - Don't report writepage()/writepages() errors twice" * tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: remove set but not used variable 'mapping' NFSv2: Fix write regression NFSv2: Fix eof handling NFS: Fix writepage(s) error handling to not report errors twice NFS: Fix spurious EIO read errors pNFS/flexfiles: Don't time out requests on hard mounts SUNRPC: Handle connection breakages correctly in call_status() Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated" SUNRPC: Handle EADDRINUSE and ENOBUFS correctly pNFS/flexfiles: Turn off soft RPC calls SUNRPC: Don't handle errors if the bind/connect succeeded NFS: On fatal writeback errors, we need to call nfs_inode_remove_request() NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0 NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend() NFSv4: Fix return value in nfs_finish_open() NFSv4: Fix return values for nfs4_file_open() NFS: Don't refresh attributes with mounted-on-file information
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya Leoshkevich. 2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for SMC from Jason Baron. 3) ipv6_mc_may_pull() should return 0 for malformed packets, not -EINVAL. From Stefano Brivio. 4) Don't forget to unpin umem xdp pages in error path of xdp_umem_reg(). From Ivan Khoronzhuk. 5) Fix sta object leak in mac80211, from Johannes Berg. 6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2 switches. From Florian Fainelli. 7) Revert DMA sync removal from r8169 which was causing regressions on some MIPS Loongson platforms. From Heiner Kallweit. 8) Use after free in flow dissector, from Jakub Sitnicki. 9) Fix NULL derefs of net devices during ICMP processing across collect_md tunnels, from Hangbin Liu. 10) proto_register() memory leaks, from Zhang Lin. 11) Set NLM_F_MULTI flag in multipart netlink messages consistently, from John Fastabend. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) r8152: Set memory to all 0xFFs on failed reg reads openvswitch: Fix conntrack cache with timeout ipv4: mpls: fix mpls_xmit for iptunnel nexthop: Fix nexthop_num_path for blackhole nexthops net: rds: add service level support in rds-info net: route dump netlink NLM_F_MULTI flag missing s390/qeth: reject oversized SNMP requests sock: fix potential memory leak in proto_register() MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode ipv4/icmp: fix rt dst dev null pointer dereference openvswitch: Fix log message in ovs conntrack bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 bpf: fix use after free in prog symbol exposure bpf: fix precision tracking in presence of bpf2bpf calls flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH Revert "r8169: remove not needed call to dma_sync_single_for_device" ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev net/ncsi: Fix the payload copying for the request coming from Netlink qed: Add cleanup in qed_slowpath_start() ...
2019-08-27netfilter: not mark a spinlock as __read_mostlyLi RongQing
when spinlock is locked/unlocked, its elements will be changed, so marking it as __read_mostly is not suitable. and remove a duplicate definition of nf_conntrack_locks_all_lock strange that compiler does not complain. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27netfilter: conntrack: make sysctls per-namespace againFlorian Westphal
When I merged the extension sysctl tables with the main one I forgot to reset them on netns creation. They currently read/write init_net settings. Fixes: d912dec12428 ("netfilter: conntrack: merge acct and helper sysctl table with main one") Fixes: cb2833ed0044 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one") Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27netfilter: nft_dynset: support for element deletionAnder Juaristi
This patch implements the delete operation from the ruleset. It implements a new delete() function in nft_set_rhash. It is simpler to use than the already existing remove(), because it only takes the set and the key as arguments, whereas remove() expects a full nft_set_elem structure. Signed-off-by: Ander Juaristi <a@juaristi.eus> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27netfilter: nf_conntrack_ftp: Fix debug outputThomas Jarosch
The find_pattern() debug output was printing the 'skip' character. This can be a NULL-byte and messes up further pr_debug() output. Output without the fix: kernel: nf_conntrack_ftp: Pattern matches! kernel: nf_conntrack_ftp: Skipped up to `<7>nf_conntrack_ftp: find_pattern `PORT': dlen = 8 kernel: nf_conntrack_ftp: find_pattern `EPRT': dlen = 8 Output with the fix: kernel: nf_conntrack_ftp: Pattern matches! kernel: nf_conntrack_ftp: Skipped up to 0x0 delimiter! kernel: nf_conntrack_ftp: Match succeeded! kernel: nf_conntrack_ftp: conntrack_ftp: match `172,17,0,100,200,207' (20 bytes at 4150681645) kernel: nf_conntrack_ftp: find_pattern `PORT': dlen = 8 Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27netfilter: xt_physdev: Fix spurious error message in physdev_mt_checkTodd Seidelmann
Simplify the check in physdev_mt_check() to emit an error message only when passed an invalid chain (ie, NF_INET_LOCAL_OUT). This avoids cluttering up the log with errors against valid rules. For large/heavily modified rulesets, current behavior can quickly overwhelm the ring buffer, because this function gets called on every change, regardless of the rule that was changed. Signed-off-by: Todd Seidelmann <tseidelmann@linode.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27rxrpc: Use skb_unshare() rather than skb_cow_data()David Howells
The in-place decryption routines in AF_RXRPC's rxkad security module currently call skb_cow_data() to make sure the data isn't shared and that the skb can be written over. This has a problem, however, as the softirq handler may be still holding a ref or the Rx ring may be holding multiple refs when skb_cow_data() is called in rxkad_verify_packet() - and so skb_shared() returns true and __pskb_pull_tail() dislikes that. If this occurs, something like the following report will be generated. kernel BUG at net/core/skbuff.c:1463! ... RIP: 0010:pskb_expand_head+0x253/0x2b0 ... Call Trace: __pskb_pull_tail+0x49/0x460 skb_cow_data+0x6f/0x300 rxkad_verify_packet+0x18b/0xb10 [rxrpc] rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc] rxrpc_kernel_recv_data+0x126/0x240 [rxrpc] afs_extract_data+0x51/0x2d0 [kafs] afs_deliver_fs_fetch_data+0x188/0x400 [kafs] afs_deliver_to_call+0xac/0x430 [kafs] afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs] afs_make_call+0x282/0x3f0 [kafs] afs_fs_fetch_data+0x164/0x300 [kafs] afs_fetch_data+0x54/0x130 [kafs] afs_readpages+0x20d/0x340 [kafs] read_pages+0x66/0x180 __do_page_cache_readahead+0x188/0x1a0 ondemand_readahead+0x17d/0x2e0 generic_file_read_iter+0x740/0xc10 __vfs_read+0x145/0x1a0 vfs_read+0x8c/0x140 ksys_read+0x4a/0xb0 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by using skb_unshare() instead in the input path for DATA packets that have a security index != 0. Non-DATA packets don't need in-place encryption and neither do unencrypted DATA packets. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Julian Wollrath <jwollrath@web.de> Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Use the tx-phase skb flag to simplify tracingDavid Howells
Use the previously-added transmit-phase skbuff private flag to simplify the socket buffer tracing a bit. Which phase the skbuff comes from can now be divined from the skb rather than having to be guessed from the call state. We can also reduce the number of rxrpc_skb_trace values by eliminating the difference between Tx and Rx in the symbols. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Add a private skb flag to indicate transmission-phase skbsDavid Howells
Add a flag in the private data on an skbuff to indicate that this is a transmission-phase buffer rather than a receive-phase buffer. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Abstract out rxtx ring cleanupDavid Howells
Abstract out rxtx ring cleanup into its own function from its two callers. This makes it easier to apply the same changes to both. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Pass the input handler's data skb reference to the Rx ringDavid Howells
Pass the reference held on a DATA skb in the rxrpc input handler into the Rx ring rather than getting an additional ref for this and then dropping the original ref at the end. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Use info in skbuff instead of reparsing a jumbo packetDavid Howells
Use the information now cached in the skbuff private data to avoid the need to reparse a jumbo packet. We can find all the subpackets by dead reckoning, so it's only necessary to note how many there are, whether the last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK flag set. This is necessary as once recvmsg() can see the packet, it can start modifying it, such as doing in-place decryption. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Improve jumbo packet countingDavid Howells
Improve the information stored about jumbo packets so that we don't need to reparse them so much later. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-08-26net: sched: flower: don't take rtnl lock for cls hw offloads APIVlad Buslov
Don't manually take rtnl lock in flower classifier before calling cls hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held' parameter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: copy tunnel info when setting flow_action entry->tunnelVlad Buslov
In order to remove dependency on rtnl lock, modify tc_setup_flow_action() to copy tunnel info, instead of just saving pointer to tunnel_key action tunnel info. This is necessary to prevent concurrent action overwrite from releasing tunnel info while it is being used by rtnl-unlocked driver. Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info with all its options to dynamically allocated memory block. Modify tc_cleanup_flow_action() to free dynamically allocated tunnel info. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: take reference to action dev before calling offloadsVlad Buslov
In order to remove dependency on rtnl lock when calling hardware offload API, take reference to action mirred dev when initializing flow_action structure in tc_setup_flow_action(). Implement function tc_cleanup_flow_action(), use it to release the device after hardware offload API is done using it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: take rtnl lock in tc_setup_flow_action()Vlad Buslov
In order to allow using new flow_action infrastructure from unlocked classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held' argument. Take rtnl lock before accessing tc_action data. This is necessary to protect from concurrent action replace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: conditionally obtain rtnl lock in cls hw offloads APIVlad Buslov
In order to remove dependency on rtnl lock from offloads code of classifiers, take rtnl lock conditionally before executing driver callbacks. Only obtain rtnl lock if block is bound to devices that require it. Block bind/unbind code is rtnl-locked and obtains block->cb_lock while holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions to prevent deadlock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>