summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Off by one in netlink parsing of mac802154_hwsim, from Alexander Aring. 2) nf_tables RCU usage fix from Taehee Yoo. 3) Flow dissector needs nhoff and thoff clamping, from Stanislav Fomichev. 4) Missing sin6_flowinfo initialization in SCTP, from Xin Long. 5) Spectrev1 in ipmr and ip6mr, from Gustavo A. R. Silva. 6) Fix r8169 crash when DEBUG_SHIRQ is enabled, from Heiner Kallweit. 7) Fix SKB leak in rtlwifi, from Larry Finger. 8) Fix state pruning in bpf verifier, from Jakub Kicinski. 9) Don't handle completely duplicate fragments as overlapping, from Michal Kubecek. 10) Fix memory corruption with macb and 64-bit DMA, from Anssi Hannula. 11) Fix TCP fallback socket release in smc, from Myungho Jung. 12) gro_cells_destroy needs to napi_disable, from Lorenzo Bianconi. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (130 commits) rds: Fix warning. neighbor: NTF_PROXY is a valid ndm_flag for a dump request net: mvpp2: fix the phylink mode validation net/sched: cls_flower: Remove old entries from rhashtable net/tls: allocate tls context using GFP_ATOMIC iptunnel: make TUNNEL_FLAGS available in uapi gro_cell: add napi_disable in gro_cells_destroy lan743x: Remove MAC Reset from initialization net/mlx5e: Remove the false indication of software timestamping support net/mlx5: Typo fix in del_sw_hw_rule net/mlx5e: RX, Fix wrong early return in receive queue poll ipv6: explicitly initialize udp6_addr in udp_sock_create6() bnxt_en: Fix ethtool self-test loopback. net/rds: remove user triggered WARN_ON in rds_sendmsg net/rds: fix warn in rds_message_alloc_sgs ath10k: skip sending quiet mode cmd for WCN3990 mac80211: free skb fraglist before freeing the skb nl80211: fix memory leak if validate_pae_over_nl80211() fails net/smc: fix TCP fallback socket release vxge: ensure data0 is initialized in when fetching firmware version information ...
2018-12-19rds: Fix warning.David S. Miller
>> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer Fixes: ea010070d0a7 ("net/rds: fix warn in rds_message_alloc_sgs") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19neighbor: NTF_PROXY is a valid ndm_flag for a dump requestDavid Ahern
When dumping proxy entries the dump request has NTF_PROXY set in ndm_flags. strict mode checking needs to be updated to allow this flag. Fixes: 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict data checking") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19neighbor: Initialize protocol when new pneigh_entry are createdDavid Ahern
pneigh_lookup uses kmalloc versus kzalloc when new entries are allocated. Given that the newly added protocol field needs to be initialized. Fixes: df9b0e30d44c ("neighbor: Add protocol attribute") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net/sched: cls_flower: Remove old entries from rhashtableRoi Dayan
When replacing a rule we add the new rule to the rhashtable but only remove the old if not in skip_sw. This commit fix this and remove the old rule anyway. Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net/tls: allocate tls context using GFP_ATOMICGanesh Goudar
create_ctx can be called from atomic context, hence use GFP_ATOMIC instead of GFP_KERNEL. [ 395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl [ 395.996564] 2 locks held by openssl/16254: [ 396.010492] #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0 [ 396.029838] #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280 [ 396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G O 4.20.0-rc6+ #25 [ 396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017 [ 396.083537] Call Trace: [ 396.096265] dump_stack+0x5e/0x8b [ 396.109876] ___might_sleep+0x216/0x250 [ 396.123940] kmem_cache_alloc_trace+0x1b0/0x240 [ 396.138800] create_ctx+0x1f/0x60 [ 396.152504] tls_init+0xbd/0x280 [ 396.166135] tcp_set_ulp+0x191/0x2d0 [ 396.180035] ? tcp_set_ulp+0x2c/0x2d0 [ 396.193960] do_tcp_setsockopt.isra.44+0x148/0x9a0 [ 396.209013] __sys_setsockopt+0x7c/0xe0 [ 396.223054] __x64_sys_setsockopt+0x20/0x30 [ 396.237378] do_syscall_64+0x4a/0x180 [ 396.251200] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context") Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19gro_cell: add napi_disable in gro_cells_destroyLorenzo Bianconi
Add napi_disable routine in gro_cells_destroy since starting from commit c42858eaf492 ("gro_cells: remove spinlock protecting receive queues") gro_cell_poll and gro_cells_destroy can run concurrently on napi_skbs list producing a kernel Oops if the tunnel interface is removed while gro_cell_poll is running. The following Oops has been triggered removing a vxlan device while the interface is receiving traffic [ 5628.948853] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 5628.949981] PGD 0 P4D 0 [ 5628.950308] Oops: 0002 [#1] SMP PTI [ 5628.950748] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.20.0-rc6+ #41 [ 5628.952940] RIP: 0010:gro_cell_poll+0x49/0x80 [ 5628.955615] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202 [ 5628.956250] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000 [ 5628.957102] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150 [ 5628.957940] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000 [ 5628.958803] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040 [ 5628.959661] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040 [ 5628.960682] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000 [ 5628.961616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5628.962359] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0 [ 5628.963188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5628.964034] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5628.964871] Call Trace: [ 5628.965179] net_rx_action+0xf0/0x380 [ 5628.965637] __do_softirq+0xc7/0x431 [ 5628.966510] run_ksoftirqd+0x24/0x30 [ 5628.966957] smpboot_thread_fn+0xc5/0x160 [ 5628.967436] kthread+0x113/0x130 [ 5628.968283] ret_from_fork+0x3a/0x50 [ 5628.968721] Modules linked in: [ 5628.969099] CR2: 0000000000000008 [ 5628.969510] ---[ end trace 9d9dedc7181661fe ]--- [ 5628.970073] RIP: 0010:gro_cell_poll+0x49/0x80 [ 5628.972965] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202 [ 5628.973611] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000 [ 5628.974504] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150 [ 5628.975462] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000 [ 5628.976413] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040 [ 5628.977375] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040 [ 5628.978296] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000 [ 5628.979327] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5628.980044] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0 [ 5628.980929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5628.981736] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5628.982409] Kernel panic - not syncing: Fatal exception in interrupt [ 5628.983307] Kernel Offset: disabled Fixes: c42858eaf492 ("gro_cells: remove spinlock protecting receive queues") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19neighbour: register rtnl doit handlerRoopa Prabhu
this patch registers neigh doit handler. The doit handler returns a neigh entry given dst and dev. This is similar to route and fdb doit (get) handlers. Also moves nda_policy declaration from rtnetlink.c to neighbour.c Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19xsk: simplify AF_XDP socket teardownBjörn Töpel
Prior this commit, when the struct socket object was being released, the UMEM did not have its reference count decreased. Instead, this was done in the struct sock sk_destruct function. There is no reason to keep the UMEM reference around when the socket is being orphaned, so in this patch the xdp_put_mem is called in the xsk_release function. This results in that the xsk_destruct function can be removed! Note that, it still holds that a struct xsk_sock reference might still linger in the XSKMAP after the UMEM is released, e.g. if a user does not clear the XSKMAP prior to closing the process. This sock will be in a "released" zombie like state, until the XSKMAP is removed. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-19ipv6: explicitly initialize udp6_addr in udp_sock_create6()Cong Wang
syzbot reported the use of uninitialized udp6_addr::sin6_scope_id. We can just set ::sin6_scope_id to zero, as tunnels are unlikely to use an IPv6 address that needs a scope id and there is no interface to bind in this context. For net-next, it looks different as we have cfg->bind_ifindex there so we can probably call ipv6_iface_scope_id(). Same for ::sin6_flowinfo, tunnels don't use it. Fixes: 8024e02879dd ("udp: Add udp_sock_create for UDP tunnels to open listener socket") Reported-by: syzbot+c56449ed3652e6720f30@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19tipc: fix uninitialized value for broadcast retransmissionHoang Le
When sending broadcast message on high load system, there are a lot of unnecessary packets restranmission. That issue was caused by missing in initial criteria for retransmission. To prevent this happen, just initialize this criteria for retransmission in next 10 milliseconds. Fixes: 31c4f4cc32f7 ("tipc: improve broadcast retransmission algorithm") Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19tipc: add trace_events for tipc bearerTuong Lien
The commit adds the new trace_event for TIPC bearer, L2 device event: trace_tipc_l2_device_event() Also, it puts the trace at the tipc_l2_device_event() function, then the device/bearer events and related info can be traced out during runtime when needed. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19tipc: add trace_events for tipc nodeTuong Lien
The commit adds the new trace_events for TIPC node object: trace_tipc_node_create() trace_tipc_node_delete() trace_tipc_node_lost_contact() trace_tipc_node_timeout() trace_tipc_node_link_up() trace_tipc_node_link_down() trace_tipc_node_reset_links() trace_tipc_node_fsm_evt() trace_tipc_node_check_state() Also, enables the traces for the following cases: - When a node is created/deleted; - When a node contact is lost; - When a node timer is timed out; - When a node link is up/down; - When all node links are reset; - When node state is changed; - When a skb comes and node state needs to be checked/updated. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19tipc: add trace_events for tipc socketTuong Lien
The commit adds the new trace_events for TIPC socket object: trace_tipc_sk_create() trace_tipc_sk_poll() trace_tipc_sk_sendmsg() trace_tipc_sk_sendmcast() trace_tipc_sk_sendstream() trace_tipc_sk_filter_rcv() trace_tipc_sk_advance_rx() trace_tipc_sk_rej_msg() trace_tipc_sk_drop_msg() trace_tipc_sk_release() trace_tipc_sk_shutdown() trace_tipc_sk_overlimit1() trace_tipc_sk_overlimit2() Also, enables the traces for the following cases: - When user creates a TIPC socket; - When user calls poll() on TIPC socket; - When user sends a dgram/mcast/stream message. - When a message is put into the socket 'sk_receive_queue'; - When a message is released from the socket 'sk_receive_queue'; - When a message is rejected (e.g. due to no port, invalid, etc.); - When a message is dropped (e.g. due to wrong message type); - When socket is released; - When socket is shutdown; - When socket rcvq's allocation is overlimit (> 90%); - When socket rcvq + bklq's allocation is overlimit (> 90%); - When the 'TIPC_ERR_OVERLOAD/2' issue happens; Note: a) All the socket traces are designed to be able to trace on a specific socket by either using the 'event filtering' feature on a known socket 'portid' value or the sysctl file: /proc/sys/net/tipc/sk_filter The file determines a 'tuple' for what socket should be traced: (portid, sock type, name type, name lower, name upper) where: + 'portid' is the socket portid generated at socket creating, can be found in the trace outputs or the 'tipc socket list' command printouts; + 'sock type' is the socket type (1 = SOCK_TREAM, ...); + 'name type', 'name lower' and 'name upper' are the service name being connected to or published by the socket. Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e. the traces happen for every sockets with no filter. b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event which happens when the socket receive queue (and backlog queue) is about to be overloaded, when the queue allocation is > 90%. Then, when the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2 issue can be traced. The trace event is designed as an 'upper watermark' notification that the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or actions can be triggerred in the meanwhile to see what is going on with the socket queue. In addition, the 'trace_tipc_sk_dump()' is also placed at the 'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped for post-analysis. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19tipc: add trace_events for tipc linkTuong Lien
The commit adds the new trace_events for TIPC link object: trace_tipc_link_timeout() trace_tipc_link_fsm() trace_tipc_link_reset() trace_tipc_link_too_silent() trace_tipc_link_retrans() trace_tipc_link_bc_ack() trace_tipc_link_conges() And the traces for PROTOCOL messages at building and receiving: trace_tipc_proto_build() trace_tipc_proto_rcv() Note: a) The 'tipc_link_too_silent' event will only happen when the 'silent_intv_cnt' is about to reach the 'abort_limit' value (and the event is enabled). The benefit for this kind of event is that we can get an early indication about TIPC link loss issue due to timeout, then can do some necessary actions for troubleshooting. For example: To trigger the 'tipc_proto_rcv' when the 'too_silent' event occurs: echo 'enable_event:tipc:tipc_proto_rcv' > \ events/tipc/tipc_link_too_silent/trigger And disable it when TIPC link is reset: echo 'disable_event:tipc:tipc_proto_rcv' > \ events/tipc/tipc_link_reset/trigger b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to trace TIPC retransmission issues. In addition, the commit adds the 'trace_tipc_list/link_dump()' at the 'retransmission failure' case. Then, if the issue occurs, the link 'transmq' along with the link data can be dumped for post-analysis. These dump events should be enabled by default since it will only take effect when the failure happens. The same approach is also applied for the faulty case that the validation of protocol message is failed. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19tipc: enable tracepoints in tipcTuong Lien
As for the sake of debugging/tracing, the commit enables tracepoints in TIPC along with some general trace_events as shown below. It also defines some 'tipc_*_dump()' functions that allow to dump TIPC object data whenever needed, that is, for general debug purposes, ie. not just for the trace_events. The following trace_events are now available: - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data, e.g. message type, user, droppable, skb truesize, cloned skb, etc. - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or queues, e.g. TIPC link transmq, socket receive queue, etc. - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g. sk state, sk type, connection type, rmem_alloc, socket queues, etc. - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g. link state, silent_intv_cnt, gap, bc_gap, link queues, etc. - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g. node state, active links, capabilities, link entries, etc. How to use: Put the trace functions at any places where we want to dump TIPC data or events. Note: a) The dump functions will generate raw data only, that is, to offload the trace event's processing, it can require a tool or script to parse the data but this should be simple. b) The trace_tipc_*_dump() should be reserved for a failure cases only (e.g. the retransmission failure case) or where we do not expect to happen too often, then we can consider enabling these events by default since they will almost not take any effects under normal conditions, but once the rare condition or failure occurs, we get the dumped data fully for post-analysis. For other trace purposes, we can reuse these trace classes as template but different events. c) A trace_event is only effective when we enable it. To enable the TIPC trace_events, echo 1 to 'enable' files in the events/tipc/ directory in the 'debugfs' file system. Normally, they are located at: /sys/kernel/debug/tracing/events/tipc/ For example: To enable the tipc_link_dump event: echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable To enable all the TIPC trace_events: echo 1 > /sys/kernel/debug/tracing/events/tipc/enable To collect the trace data: cat trace or cat trace_pipe > /trace.out & To disable all the TIPC trace_events: echo 0 > /sys/kernel/debug/tracing/events/tipc/enable To clear the trace buffer: echo > trace d) Like the other trace_events, the feature like 'filter' or 'trigger' is also usable for the tipc trace_events. For more details, have a look at: Documentation/trace/ftrace.txt MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net: switch secpath to use skb extension infrastructureFlorian Westphal
Remove skb->sp and allocate secpath storage via extension infrastructure. This also reduces sk_buff by 8 bytes on x86_64. Total size of allyesconfig kernel is reduced slightly, as there is less inlined code (one conditional atomic op instead of two on skb_clone). No differences in throughput in following ipsec performance tests: - transport mode with aes on 10GB link - tunnel mode between two network namespaces with aes and null cipher Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19xfrm: prefer secpath_set over secpath_dupFlorian Westphal
secpath_set is a wrapper for secpath_dup that will not perform an allocation if the secpath attached to the skb has a reference count of one, i.e., it doesn't need to be COW'ed. Also, secpath_dup doesn't attach the secpath to the skb, it leaves this to the caller. Use secpath_set in places that immediately assign the return value to skb. This allows to remove skb->sp without touching these spots again. secpath_dup can eventually be removed in followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19xfrm: use secpath_exist where applicableFlorian Westphal
Will reduce noise when skb->sp is removed later in this series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net: use skb_sec_path helper in more placesFlorian Westphal
skb_sec_path gains 'const' qualifier to avoid xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type same reasoning as previous conversions: Won't need to touch these spots anymore when skb->sp is removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net: move secpath_exist helper to sk_buff.hFlorian Westphal
Future patch will remove skb->sp pointer. To reduce noise in those patches, move existing helper to sk_buff and use it in more places to ease skb->sp replacement later. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19xfrm: change secpath_set to return secpath struct, not error valueFlorian Westphal
It can only return 0 (success) or -ENOMEM. Change return value to a pointer to secpath struct. This avoids direct access to skb->sp: err = secpath_set(skb); if (!err) .. skb->sp-> ... Becomes: sp = secpath_set(skb) if (!sp) .. sp-> .. This reduces noise in followup patch which is going to remove skb->sp. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net: convert bridge_nf to use skb extension infrastructureFlorian Westphal
This converts the bridge netfilter (calling iptables hooks from bridge) facility to use the extension infrastructure. The bridge_nf specific hooks in skb clone and free paths are removed, they have been replaced by the skb_ext hooks that do the same as the bridge nf allocations hooks did. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19sk_buff: add skb extension infrastructureFlorian Westphal
This adds an optional extension infrastructure, with ispec (xfrm) and bridge netfilter as first users. objdiff shows no changes if kernel is built without xfrm and br_netfilter support. The third (planned future) user is Multipath TCP which is still out-of-tree. MPTCP needs to map logical mptcp sequence numbers to the tcp sequence numbers used by individual subflows. This DSS mapping is read/written from tcp option space on receive and written to tcp option space on transmitted tcp packets that are part of and MPTCP connection. Extending skb_shared_info or adding a private data field to skb fclones doesn't work for incoming skb, so a different DSS propagation method would be required for the receive side. mptcp has same requirements as secpath/bridge netfilter: 1. extension memory is released when the sk_buff is free'd. 2. data is shared after cloning an skb (clone inherits extension) 3. adding extension to an skb will COW the extension buffer if needed. The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the mapping for tx and rx processing. Two new members are added to sk_buff: 1. 'active_extensions' byte (filling a hole), telling which extensions are available for this skb. This has two purposes. a) avoids the need to initialize the pointer. b) allows to "delete" an extension by clearing its bit value in ->active_extensions. While it would be possible to store the active_extensions byte in the extension struct instead of sk_buff, there is one problem with this: When an extension has to be disabled, we can always clear the bit in skb->active_extensions. But in case it would be stored in the extension buffer itself, we might have to COW it first, if we are dealing with a cloned skb. On kmalloc failure we would be unable to turn an extension off. 2. extension pointer, located at the end of the sk_buff. If the active_extensions byte is 0, the pointer is undefined, it is not initialized on skb allocation. This adds extra code to skb clone and free paths (to deal with refcount/free of extension area) but this replaces similar code that manages skb->nf_bridge and skb->sp structs in the followup patches of the series. It is possible to add support for extensions that are not preseved on clones/copies. To do this, it would be needed to define a bitmask of all extensions that need copy/cow semantics, and change __skb_ext_copy() to check ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set ->active_extensions to 0 on the new clone. This isn't done here because all extensions that get added here need the copy/cow semantics. v2: Allocate entire extension space using kmem_cache. Upside is that this allows better tracking of used memory, downside is that we will allocate more space than strictly needed in most cases (its unlikely that all extensions are active/needed at same time for same skb). The allocated memory (except the small extension header) is not cleared, so no additonal overhead aside from memory usage. Avoid atomic_dec_and_test operation on skb_ext_put() by using similar trick as kfree_skbmem() does with fclone_ref: If recount is 1, there is no concurrent user and we can free right away. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19netfilter: avoid using skb->nf_bridge directlyFlorian Westphal
This pointer is going to be removed soon, so use the existing helpers in more places to avoid noise when the removal happens. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19SUNRPC discard cr_uid from struct rpc_cred.NeilBrown
Just use ->cr_cred->fsuid directly. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: simplify auth_unix.NeilBrown
1/ discard 'struct unx_cred'. We don't need any data that is not already in 'struct rpc_cred'. 2/ Don't keep these creds in a hash table. When a credential is needed, simply allocate it. When not needed, discard it. This can easily be faster than performing a lookup on a shared hash table. As the lookup can happen during write-out, use a mempool to ensure forward progress. This means that we cannot compare two credentials for equality by comparing the pointers, but we never do that anyway. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove crbind rpc_cred operationNeilBrown
This now always just does get_rpccred(), so we don't need an operation pointer to know to do that. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove generic cred code.NeilBrown
This is no longer used. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.NeilBrown
SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove RPCAUTH_AUTH_NO_CRKEY_TIMEOUTNeilBrown
This is no longer used. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19NFS: move credential expiry tracking out of SUNRPC into NFS.NeilBrown
NFS needs to know when a credential is about to expire so that it can modify write-back behaviour to finish the write inside the expiry time. It currently uses functions in SUNRPC code which make use of a fairly complex callback scheme and flags in the generic credientials. As I am working to discard the generic credentials, this has to change. This patch moves the logic into NFS, in part by finding and caching the low-level credential in the open_context. We then make direct cred-api calls on that. This makes the code much simpler and removes a dependency on generic rpc credentials. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: add side channel to use non-generic cred for rpc call.NeilBrown
The credential passed in rpc_message.rpc_cred is always a generic credential except in one instance. When gss_destroying_context() calls rpc_call_null(), it passes a specific credential that it needs to destroy. In this case the RPC acts *on* the credential rather than being authorized by it. This special case deserves explicit support and providing that will mean that rpc_message.rpc_cred is *always* generic, allowing some optimizations. So add "tk_op_cred" to rpc_task and "rpc_op_cred" to the setup data. Use this to pass the cred down from rpc_call_null(), and have rpcauth_bindcred() notice it and bind it in place. Credit to kernel test robot <fengguang.wu@intel.com> for finding a bug in earlier version of this patch. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: introduce RPC_TASK_NULLCREDS to request auth_noneNeilBrown
In almost all cases the credential stored in rpc_message.rpc_cred is a "generic" credential. One of the two expections is when an AUTH_NULL credential is used such as for RPC ping requests. To improve consistency, don't pass an explicit credential in these cases, but instead pass NULL and set a task flag, similar to RPC_TASK_ROOTCREDS, which requests that NULL credentials be used by default. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred().NeilBrown
When NFS creates a machine credential, it is a "generic" credential, not tied to any auth protocol, and is really just a container for the princpal name. This doesn't get linked to a genuine credential until rpcauth_bindcred() is called. The lookup always succeeds, so various places that test if the machine credential is NULL, are pointless. As a step towards getting rid of generic credentials, this patch gets rid of generic machine credentials. The nfs_client and rpc_client just hold a pointer to a constant principal name. When a machine credential is wanted, a special static 'struct rpc_cred' pointer is used. rpcauth_bindcred() recognizes this, finds the principal from the client, and binds the correct credential. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove machine_cred field from struct auth_credNeilBrown
The cred is a machine_cred iff ->principal is set, so there is no need for the extra flag. There is one case which deserves some explanation. nfs4_root_machine_cred() calls rpc_lookup_machine_cred() with a NULL principal name which results in not getting a machine credential, but getting a root credential instead. This appears to be what is expected of the caller, and is clearly the result provided by both auth_unix and auth_gss which already ignore the flag. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove uid and gid from struct auth_credNeilBrown
Use cred->fsuid and cred->fsgid instead. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove groupinfo from struct auth_cred.NeilBrown
We can use cred->groupinfo (from the 'struct cred') instead. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: add 'struct cred *' to auth_cred and rpc_credNeilBrown
The SUNRPC credential framework was put together before Linux has 'struct cred'. Now that we have it, it makes sense to use it. This first step just includes a suitable 'struct cred *' pointer in every 'struct auth_cred' and almost every 'struct rpc_cred'. The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing else really makes sense. For rpc_cred, the pointer is reference counted. For auth_cred it isn't. struct auth_cred are either allocated on the stack, in which case the thread owns a reference to the auth, or are part of 'struct generic_cred' in which case gc_base owns the reference, and "acred" shares it. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: allow /proc entries without CONFIG_SUNRPC_DEBUGBen Dooks
If we want /proc/sys/sunrpc the current kernel also drags in other debug features which we don't really want. Instead, we should always show the following entries: /proc/sys/sunrpc/udp_slot_table_entries /proc/sys/sunrpc/tcp_slot_table_entries /proc/sys/sunrpc/tcp_max_slot_table_entries /proc/sys/sunrpc/min_resvport /proc/sys/sunrpc/max_resvport /proc/sys/sunrpc/tcp_fin_timeout Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Thomas Preston <thomas.preston@codethink.co.uk> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19net/rds: remove user triggered WARN_ON in rds_sendmsgshamir rabinovitch
per comment from Leon in rdma mailing list https://lkml.org/lkml/2018/10/31/312 : Please don't forget to remove user triggered WARN_ON. https://lwn.net/Articles/769365/ "Greg Kroah-Hartman raised the problem of core kernel API code that will use WARN_ON_ONCE() to complain about bad usage; that will not generate the desired result if WARN_ON_ONCE() is configured to crash the machine. He was told that the code should just call pr_warn() instead, and that the called function should return an error in such situations. It was generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be triggered from user space need to be fixed." in addition harden rds_sendmsg to detect and overcome issues with invalid sg count and fail the sendmsg. Suggested-by: Leon Romanovsky <leon@kernel.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19net/rds: fix warn in rds_message_alloc_sgsshamir rabinovitch
redundant copy_from_user in rds_sendmsg system call expose rds to issue where rds_rdma_extra_size walk the rds iovec and and calculate the number pf pages (sgs) it need to add to the tail of rds message and later rds_cmsg_rdma_args copy the rds iovec again and re calculate the same number and get different result causing WARN_ON in rds_message_alloc_sgs. fix this by doing the copy_from_user only once per rds_sendmsg system call. When issue occur the below dump is seen: WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 panic+0x2ad/0x55c kernel/panic.c:188 __warn.cold.8+0x20/0x45 kernel/panic.c:540 report_bug+0x254/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969 RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316 Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa <0f> 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293 RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6 RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004 RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67 R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000 R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005 rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623 rds_cmsg_send net/rds/send.c:971 [inline] rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117 __sys_sendmsg+0x11d/0x280 net/socket.c:2155 __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x44a859 Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859 RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003 RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c Kernel Offset: disabled Rebooting in 86400 seconds.. Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-12-19 Here's the main bluetooth-next pull request for 4.21: - Multiple fixes & improvements for Broadcom-based controllers - New USB ID for an Intel controller - Support for new Broadcom controller variants - Use DEFINE_SHOW_ATTRIBUTE to simplify debugfs code - Eliminate confusing "last event is not cmd complete" warning message - Added vendor suspend/resume support for H:5 (3-Wire UART) controllers - Various other smaller improvements & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19Merge tag 'mac80211-next-for-davem-2018-12-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have too many changes to list, highlights: * virt_wifi - wireless control simulation on top of another network interface * hwsim configurability to test capabilities similar to real hardware * various mesh improvements * various radiotap vendor data fixes in mac80211 * finally the nl_set_extack_cookie_u64() we talked about previously, used for * peer measurement APIs, right now only with FTM (flight time measurement) for location * made nl80211 radio/interface announcements more complete * various new HE (802.11ax) things: updates, TWT support, ... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19Merge tag 'mac80211-for-davem-2018-12-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just three fixes: * fix a memory leak in an error path * fix TXQs in interface teardown * free fraglist if we used it internally before returning SKB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19Bluetooth: Fix unnecessary error message for HCI request completionJohan Hedberg
In case a command which completes in Command Status was sent using the hci_cmd_send-family of APIs there would be a misleading error in the hci_get_cmd_complete function, since the code would be trying to fetch the Command Complete parameters when there are none. Avoid the misleading error and silently bail out from the function in case the received event is a command status. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-12-19xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spiYueHaibing
gcc warn this: net/ipv6/xfrm6_tunnel.c:143 __xfrm6_tunnel_alloc_spi() warn: always true condition '(spi <= 4294967295) => (0-u32max <= u32max)' 'spi' is u32, which always not greater than XFRM6_TUNNEL_SPI_MAX because of wrap around. So the second forloop will never reach. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-12-19xfrm: policy: remove set but not used variable 'priority'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: net/xfrm/xfrm_policy.c: In function 'xfrm_policy_lookup_bytype': net/xfrm/xfrm_policy.c:2079:6: warning: variable 'priority' set but not used [-Wunused-but-set-variable] It not used since commit 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-12-19mac80211: Properly access radiotap vendor dataIlan Peer
The radiotap vendor data might be placed after some other radiotap elements, and thus when accessing it, need to access the correct offset in the skb data. Fix the code accordingly. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-12-19cfg80211: fix ieee80211_get_vht_max_nss()Johannes Berg
Fix two bugs in ieee80211_get_vht_max_nss(): * the spec says we should round down (reported by Nissim) * there's a double condition, the first one is wrong, supp_width == 0 / ext_nss_bw == 2 is valid in 80+80 (found by smatch) Fixes: b0aa75f0b1b2 ("ieee80211: add new VHT capability fields/parsing") Reported-by: Nissim Bendanan <nissimx.bendanan@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>