summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-03-23net: bridge: fix direct access to bridge vlan_enabled and use helperNikolay Aleksandrov
We need to use br_vlan_enabled() helper otherwise we'll break builds without bridge vlans: net/bridge//br_if.c: In function ‘br_mtu’: net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no member named ‘vlan_enabled’ if (br->vlan_enabled) ^ net/bridge//br_if.c:462:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o' failed Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: RX path for ktlsDave Watson
Add rx path for tls software implementation. recvmsg, splice_read, and poll implemented. An additional sockopt TLS_RX is added, with the same interface as TLS_TX. Either TLX_RX or TLX_TX may be provided separately, or together (with two different setsockopt calls with appropriate keys). Control messages are passed via CMSG in a similar way to transmit. If no cmsg buffer is passed, then only application data records will be passed to userspace, and EIO is returned for other types of alerts. EBADMSG is passed for decryption errors, and EMSGSIZE is passed for framing too big, and EBADMSG for framing too small (matching openssl semantics). EINVAL is returned for TLS versions that do not match the original setsockopt call. All are unrecoverable. strparser is used to parse TLS framing. Decryption is done directly in to userspace buffers if they are large enough to support it, otherwise sk_cow_data is called (similar to ipsec), and buffers are decrypted in place and copied. splice_read always decrypts in place, since no buffers are provided to decrypt in to. sk_poll is overridden, and only returns POLLIN if a full TLS message is received. Otherwise we wait for strparser to finish reading a full frame. Actual decryption is only done during recvmsg or splice_read calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Refactor variable namesDave Watson
Several config variables are prefixed with tx, drop the prefix since these will be used for both tx and rx. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Pass error code explicitly to tls_err_abortDave Watson
Pass EBADMSG explicitly to tls_err_abort. Receive path will pass additional codes - EMSGSIZE if framing is larger than max TLS record size, EINVAL if TLS version mismatch. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Move cipher info to a separate structDave Watson
Separate tx crypto parameters to a separate cipher_context struct. The same parameters will be used for rx using the same struct. tls_advance_record_sn is modified to only take the cipher info. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Generalize zerocopy_from_iterDave Watson
Refactor zerocopy_from_iter to take arguments for pages and size, such that it can be used for both tx and rx. RX will also support zerocopy direct to output iter, as long as the full message can be copied at once (a large enough userspace buffer was provided). Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23bridge: Allow max MTU when multiple VLANs presentChas Williams
If the bridge is allowing multiple VLANs, some VLANs may have different MTUs. Instead of choosing the minimum MTU for the bridge interface, choose the maximum MTU of the bridge members. With this the user only needs to set a larger MTU on the member ports that are participating in the large MTU VLANS. Signed-off-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23mac80211: notify driver for change in multicast ratesPradeep Kumar Chitrapu
With drivers implementing rate control in driver or firmware rate_control_send_low() may not get called, and thus the driver needs to know about changes in the multicast rate. Add and use a new BSS change flag for this. Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-23xfrm: Fix transport mode skb control buffer usage.Steffen Klassert
A recent commit introduced a new struct xfrm_trans_cb that is used with the sk_buff control buffer. Unfortunately it placed the structure in front of the control buffer and overlooked that the IPv4/IPv6 control buffer is still needed for some layer 4 protocols. As a result the IPv4/IPv6 control buffer is overwritten with this structure. Fix this by setting a apropriate header in front of the structure. Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-22net: Replace ip_ra_lock with per-net mutexKirill Tkhai
Since ra_chain is per-net, we may use per-net mutexes to protect them in ip_ra_control(). This improves scalability. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Make ip_ra_chain per struct netKirill Tkhai
This is optimization, which makes ip_call_ra_chain() iterate less sockets to find the sockets it's looking for. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Revert "ipv4: fix a deadlock in ip_ra_control"Kirill Tkhai
This reverts commit 1215e51edad1. Since raw_close() is used on every RAW socket destruction, the changes made by 1215e51edad1 scale sadly. This clearly seen on endless unshare(CLONE_NEWNET) test, and cleanup_net() kwork spends a lot of time waiting for rtnl_lock() introduced by this commit. Previous patch moved IP_ROUTER_ALERT out of rtnl_lock(), so we revert this patch. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Move IP_ROUTER_ALERT out of lock_sock(sk)Kirill Tkhai
ip_ra_control() does not need sk_lock. Who are the another users of ip_ra_chain? ip_mroute_setsockopt() doesn't take sk_lock, while parallel IP_ROUTER_ALERT syscalls are synchronized by ip_ra_lock. So, we may move this command out of sk_lock. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Revert "ipv4: get rid of ip_ra_lock"Kirill Tkhai
This reverts commit ba3f571d5dde. The commit was made after 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control", and killed ip_ra_lock, which became useless after rtnl_lock() made used to destroy every raw ipv4 socket. This scales very bad, and next patch in series reverts 1215e51edad1. ip_ra_lock will be used again. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22gre: fix TUNNEL_SEQ bit check on sequence numberingColin Ian King
The current logic of flags | TUNNEL_SEQ is always non-zero and hence sequence numbers are always incremented no matter the setting of the TUNNEL_SEQ bit. Fix this by using & instead of |. Detected by CoverityScan, CID#1466039 ("Operands don't affect result") Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: step sk->sk_drops when rcv buffer is fullGhantaKrishnamurthy MohanKrishna
Currently when tipc is unable to queue a received message on a socket, the message is rejected back to the sender with error TIPC_ERR_OVERLOAD. However, the application on this socket has no knowledge about these discards. In this commit, we try to step the sk_drops counter when tipc is unable to queue a received message. Export sk_drops using tipc socket diagnostics. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: implement socket diagnostics for AF_TIPCGhantaKrishnamurthy MohanKrishna
This commit adds socket diagnostics capability for AF_TIPC in netlink family NETLINK_SOCK_DIAG in a new kernel module (diag.ko). The following are key design considerations: - config TIPC_DIAG has default y, like INET_DIAG. - only requests with flag NLM_F_DUMP is supported (dump all). - tipc_sock_diag_req message is introduced to send filter parameters. - the response attributes are of TLV, some nested. To avoid exposing data structures between diag and tipc modules and avoid code duplication, the following additions are required: - export tipc_nl_sk_walk function to reuse socket iterator. - export tipc_sk_fill_sock_diag to fill the tipc diag attributes. - create a sock_diag response message in __tipc_add_sock_diag defined in diag.c and use the above exported tipc_sk_fill_sock_diag to fill response. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: modify socket iterator for sock_diagGhantaKrishnamurthy MohanKrishna
The current socket iterator function tipc_nl_sk_dump, handles socket locks and calls __tipc_nl_add_sk for each socket. To reuse this logic in sock_diag implementation, we do minor modifications to make these functions generic as described below. In this commit, we add a two new functions __tipc_nl_sk_walk, __tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk accordingly. In __tipc_nl_sk_walk we: 1. acquire and release socket locks 2. for each socket, execute the specified callback function In __tipc_nl_add_sk we: - Move the netlink attribute insertion to __tipc_nl_add_sk_info. tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument. sock_diag will use these generic functions in a later commit. There is no functional change in this commit. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge tag 'mac80211-for-davem-2018-03-21' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more fixes (in three patches): * ath9k_htc doesn't like QoS NDP frames, use regular ones * hwsim: set up wmediumd for radios created later ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22devlink: Remove top_hierarchy arg to devlink_resource_registerDavid Ahern
top_hierarchy arg can be determined by comparing parent_resource_id to DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate argument. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net/ipv6: Handle onlink flag with multipath routesDavid Ahern
For multipath routes the ONLINK flag can be specified per nexthop in rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add to consider the ONLINK setting coming from rtnh_flags. Each loop over nexthops the config for the sibling route is initialized to the global config and then per nexthop settings overlayed. The flag is 'or'ed into fib6_config to handle the ONLINK flag coming from either rtm_flags or rtnh_flags. Fixes: fc1e64e1092f ("net/ipv6: Add support for onlink flag") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22ipv6: sr: fix NULL pointer dereference when setting encap source addressDavid Lebrun
When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the source address of the outer IPv6 header, in case none was specified. Using skb->dev can lead to BUG() when it is in an inconsistent state. This patch uses the net_device attached to the skb's dst instead. [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0 [940807.815725] PGD 0 P4D 0 [940807.847173] Oops: 0000 [#1] SMP PTI [940807.890073] Modules linked in: [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W 4.16.0-rc1-seg6bpf+ #2 [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26 09/06/2010 [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0 [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206 [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500 [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850 [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002 [940808.684675] FS: 0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000 [940808.783036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0 [940808.939634] Call Trace: [940808.970041] <IRQ> [940808.995250] ? ip6t_do_table+0x265/0x640 [940809.043341] seg6_do_srh_encap+0x28f/0x300 [940809.093516] ? seg6_do_srh+0x1a0/0x210 [940809.139528] seg6_do_srh+0x1a0/0x210 [940809.183462] seg6_output+0x28/0x1e0 [940809.226358] lwtunnel_output+0x3f/0x70 [940809.272370] ip6_xmit+0x2b8/0x530 [940809.313185] ? ac6_proc_exit+0x20/0x20 [940809.359197] inet6_csk_xmit+0x7d/0xc0 [940809.404173] tcp_transmit_skb+0x548/0x9a0 [940809.453304] __tcp_retransmit_skb+0x1a8/0x7a0 [940809.506603] ? ip6_default_advmss+0x40/0x40 [940809.557824] ? tcp_current_mss+0x24/0x90 [940809.605925] tcp_retransmit_skb+0xd/0x80 [940809.654016] tcp_xmit_retransmit_queue.part.17+0xf9/0x210 [940809.719797] tcp_ack+0xa47/0x1110 [940809.760612] tcp_rcv_established+0x13c/0x570 [940809.812865] tcp_v6_do_rcv+0x151/0x3d0 [940809.858879] tcp_v6_rcv+0xa5c/0xb10 [940809.901770] ? seg6_output+0xdd/0x1e0 [940809.946745] ip6_input_finish+0xbb/0x460 [940809.994837] ip6_input+0x74/0x80 [940810.034612] ? ip6_rcv_finish+0xb0/0xb0 [940810.081663] ipv6_rcv+0x31c/0x4c0 ... Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Reported-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel stateDavid Lebrun
The seg6_build_state() function is called with RCU read lock held, so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead. [ 92.770271] ============================= [ 92.770628] WARNING: suspicious RCU usage [ 92.770921] 4.16.0-rc4+ #12 Not tainted [ 92.771277] ----------------------------- [ 92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 92.772279] [ 92.772279] other info that might help us debug this: [ 92.772279] [ 92.773067] [ 92.773067] rcu_scheduler_active = 2, debug_locks = 1 [ 92.773514] 2 locks held by ip/2413: [ 92.773765] #0: (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0 [ 92.774377] #1: (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210 [ 92.775065] [ 92.775065] stack backtrace: [ 92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12 [ 92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 [ 92.776608] Call Trace: [ 92.776852] dump_stack+0x7d/0xbc [ 92.777130] __schedule+0x133/0xf00 [ 92.777393] ? unwind_get_return_address_ptr+0x50/0x50 [ 92.777783] ? __sched_text_start+0x8/0x8 [ 92.778073] ? rcu_is_watching+0x19/0x30 [ 92.778383] ? kernel_text_address+0x49/0x60 [ 92.778800] ? __kernel_text_address+0x9/0x30 [ 92.779241] ? unwind_get_return_address+0x29/0x40 [ 92.779727] ? pcpu_alloc+0x102/0x8f0 [ 92.780101] _cond_resched+0x23/0x50 [ 92.780459] __mutex_lock+0xbd/0xad0 [ 92.780818] ? pcpu_alloc+0x102/0x8f0 [ 92.781194] ? seg6_build_state+0x11d/0x240 [ 92.781611] ? save_stack+0x9b/0xb0 [ 92.781965] ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0 [ 92.782480] ? seg6_build_state+0x11d/0x240 [ 92.782925] ? lwtunnel_build_state+0x1bd/0x210 [ 92.783393] ? ip6_route_info_create+0x687/0x1640 [ 92.783846] ? ip6_route_add+0x74/0x110 [ 92.784236] ? inet6_rtm_newroute+0x8a/0xd0 Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge tag 'batadv-next-for-davem-20180319' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - avoid redundant multicast TT entries, by Linus Luessing - add netlink support for distributed arp table cache and multicast flags, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge tag 'batadv-net-for-davem-20180319' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - fix possible IPv6 packet loss when multicast extension is used, by Linus Luessing - fix SKB handling issues for TTVN and DAT, by Matthias Schiffer (two patches) - fix include for eventpoll, by Sven Eckelmann - fix skb checksum for ttvn reroutes, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22rds: tcp: remove register_netdevice_notifier infrastructure.Sowmini Varadhan
The netns deletion path does not need to wait for all net_devices to be unregistered before dismantling rds_tcp state for the netns (we are able to dismantle this state on module unload even when all net_devices are active so there is no dependency here). This patch removes code related to netdevice notifiers and refactors all the code needed to dismantle rds_tcp state into a ->exit callback for the pernet_operations used with register_pernet_device(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Convert nf_ct_net_opsKirill Tkhai
These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after a560002437d3 "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Convert lowpan_frags_opsKirill Tkhai
These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after a560002437d3 "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Convert can_pernet_opsKirill Tkhai
These pernet_operations create and destroy /proc entries and cancel per-net timer. Also, there are unneed iterations over empty list of net devices, since all net devices must be already moved to init_net or unregistered by default_device_ops. This already was mentioned here: https://marc.info/?l=linux-can&m=150169589119335&w=2 So, it looks safe to make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22netfilter: nf_tables: do not hold reference on netdevice from preparation phasePablo Neira Ayuso
The netfilter netdevice event handler hold the nfnl_lock mutex, this avoids races with a device going away while such device is being attached to hooks from the netlink control plane. Therefore, either control plane bails out with ENOENT or netdevice event path waits until the hook that is attached to net_device is registered. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-22netfilter: nf_tables: cache device name in flowtable objectPablo Neira Ayuso
Devices going away have to grab the nfnl_lock from the netdev event path to avoid races with control plane updates. However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache the device name into the objects to avoid an use-after-free situation for a device that is going away. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-22netfilter: drop template ct when conntrack is skipped.Paolo Abeni
The ipv4 nf_ct code currently skips the nf_conntrak_in() call for fragmented packets. As a results later matches/target can end up manipulating template ct entry instead of 'real' ones. Exploiting the above, syzbot found a way to trigger the following splat: WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55 xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957 RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline] RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293 RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1 RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18 R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478 ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296 iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432 raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 SYSC_sendto+0x361/0x5c0 net/socket.c:1748 SyS_sendto+0x40/0x50 net/socket.c:1716 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x441b49 RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49 RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470 R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Instead of adding checks for template ct on every target/match manipulating skb->_nfct, simply drop the template ct when skipping nf_conntrack_in(). Fixes: 7b4fdf77a450ec ("netfilter: don't track fragmented packets") Reported-and-tested-by: syzbot+0346441ae0545cfcea3a@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-21net/sched: fix idr leak in the error path of tcf_skbmod_init()Davide Caratti
tcf_skbmod_init() can fail after the idr has been successfully reserved. When this happens, every subsequent attempt to configure skbmod rules using the same idr value will systematically fail with -ENOSPC, unless the first attempt was done using the 'replace' keyword: # tc action add action skbmod swap mac index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action add action skbmod swap mac index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel # tc action add action skbmod swap mac index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel ... Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called on the error path when the idr has been reserved, but not yet inserted. Also, don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly become a 'delete' that leaks refcount in act_skbmod module: # rmmod act_skbmod; modprobe act_skbmod # tc action add action skbmod swap mac index 100 # tc action add action skbmod swap mac continue index 100 RTNETLINK answers: File exists We have an error talking to the kernel # tc action replace action skbmod swap mac continue index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action list action skbmod # # rmmod act_skbmod rmmod: ERROR: Module act_skbmod is in use Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21net/sched: fix idr leak in the error path of tcf_vlan_init()Davide Caratti
tcf_vlan_init() can fail after the idr has been successfully reserved. When this happens, every subsequent attempt to configure vlan rules using the same idr value will systematically fail with -ENOSPC, unless the first attempt was done using the 'replace' keyword. # tc action add action vlan pop index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action add action vlan pop index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel # tc action add action vlan pop index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel ... Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on the error path when the idr has been reserved, but not yet inserted. Also, don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly become a 'delete' that leaks refcount in act_vlan module: # rmmod act_vlan; modprobe act_vlan # tc action add action vlan push id 5 index 100 # tc action replace action vlan push id 7 index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action list action vlan # # rmmod act_vlan rmmod: ERROR: Module act_vlan is in use Fixes: 4c5b9d9642c8 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update") Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21net/sched: fix idr leak in the error path of __tcf_ipt_init()Davide Caratti
__tcf_ipt_init() can fail after the idr has been successfully reserved. When this happens, subsequent attempts to configure xt/ipt rules using the same idr value systematically fail with -ENOSPC: # tc action add action xt -j LOG --log-prefix test1 index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "test1" index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel Command "(null)" is unknown, try "tc actions help". # tc action add action xt -j LOG --log-prefix test1 index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "test1" index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel Command "(null)" is unknown, try "tc actions help". # tc action add action xt -j LOG --log-prefix test1 index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "test1" index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel ... Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release() in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target() to avoid NULL pointer dereference. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21net/sched: fix idr leak in the error path of tcp_pedit_init()Davide Caratti
tcf_pedit_init() can fail to allocate 'keys' after the idr has been successfully reserved. When this happens, subsequent attempts to configure a pedit rule using the same idr value systematically fail with -ENOSPC: # tc action add action pedit munge ip ttl set 63 index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action add action pedit munge ip ttl set 63 index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel # tc action add action pedit munge ip ttl set 63 index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel ... Fix this in the error path of tcf_act_pedit_init(), calling tcf_idr_release() in place of tcf_idr_cleanup(). Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21net/sched: fix idr leak in the error path of tcf_act_police_init()Davide Caratti
tcf_act_police_init() can fail after the idr has been successfully reserved (e.g., qdisc_get_rtab() may return NULL). When this happens, subsequent attempts to configure a police rule using the same idr value systematiclly fail with -ENOSPC: # tc action add action police rate 1000 burst 1000 drop index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action add action police rate 1000 burst 1000 drop index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel # tc action add action police rate 1000 burst 1000 drop index 100 RTNETLINK answers: No space left on device ... Fix this in the error path of tcf_act_police_init(), calling tcf_idr_release() in place of tcf_idr_cleanup(). Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21net/sched: fix idr leak in the error path of tcf_simp_init()Davide Caratti
if the kernel fails to duplicate 'sdata', creation of a new action fails with -ENOMEM. However, subsequent attempts to install the same action using the same value of 'index' systematically fail with -ENOSPC, and that value of 'index' will no more be usable by act_simple, until rmmod / insmod of act_simple.ko is done: # tc actions add action simple sdata hello index 100 # tc actions list action simple action order 0: Simple <hello> index 100 ref 1 bind 0 # tc actions flush action simple # tc actions add action simple sdata hello index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc actions flush action simple # tc actions add action simple sdata hello index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel # tc actions add action simple sdata hello index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel ... Fix this in the error path of tcf_simp_init(), calling tcf_idr_release() in place of tcf_idr_cleanup(). Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21net/sched: fix idr leak on the error path of tcf_bpf_init()Davide Caratti
when the following command sequence is entered # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100 RTNETLINK answers: Invalid argument We have an error talking to the kernel # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100 RTNETLINK answers: No space left on device We have an error talking to the kernel act_bpf correctly refuses to install the first TC rule, because 31 is not a valid instruction. However, it refuses to install the second TC rule, even if the BPF code is correct. Furthermore, it's no more possible to install any other rule having the same value of 'index' until act_bpf module is unloaded/inserted again. After the idr has been reserved, call tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a BPF hook for sendmsg and sendfile by reusing the ULP infrastructure and sockmap. Three helpers are added along with this, bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). The first is used to tell for how many bytes the verdict should be applied to, the second to tell that x bytes need to be queued first to retrigger the BPF program for a verdict, and the third helper is mainly for the sendfile case to pull in data for making it private for reading and/or writing, from John. 2) Improve address to symbol resolution of user stack traces in BPF stackmap. Currently, the latter stores the address for each entry in the call trace, however to map these addresses to user space files, it is necessary to maintain the mapping from these virtual addresses to symbols in the binary which is not practical for system-wide profiling. Instead, this option for the stackmap rather stores the ELF build id and offset for the call trace entries, from Song. 3) Add support that allows BPF programs attached to perf events to read the address values recorded with the perf events. They are requested through PERF_SAMPLE_ADDR via perf_event_open(). Main motivation behind it is to support building memory or lock access profiling and tracing tools with the help of BPF, from Teng. 4) Several improvements to the tools/bpf/ Makefiles. The 'make bpf' in the tools directory does not provide the standard quiet output except for bpftool and it also does not respect specifying a build output directory. 'make bpf_install' command neither respects specified destination nor prefix, all from Jiri. In addition, Jakub fixes several other minor issues in the Makefiles on top of that, e.g. fixing dependency paths, phony targets and more. 5) Various doc updates e.g. add a comment for BPF fs about reserved names to make the dentry lookup from there a bit more obvious, and a comment to the bpf_devel_QA file in order to explain the diff between native and bpf target clang usage with regards to pointer size, from Quentin and Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21cfg80211/nl80211: add DFS offload flagDmitry Lebed
Add wiphy EXT_FEATURE flag to indicate that HW or driver does all DFS actions by itself. User-space functionality already implemented in hostapd using vendor-specific (QCA) OUI to advertise DFS offload support. Need to introduce generic flag to inform about DFS offload support. For devices with DFS_OFFLOAD flag set user-space will no longer need to issue CAC or do any actions in response to "radar detected" events. HW will do everything by itself and send events to user-space to indicate that CAC was started/finished, etc. Signed-off-by: Dmitrii Lebed <dlebed@quantenna.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-21cfg80211/nl80211: add CAC_STARTED eventDmitry Lebed
CAC_STARTED event is needed for DFS offload feature and should be generated by driver/HW if DFS_OFFLOAD is enabled. Signed-off-by: Dmitry Lebed <dlebed@quantenna.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-21mac80211: inform wireless layer when frame RSSI is invalidTosoni
When the low-level driver returns an invalid RSSI indication, set the signal value to 0 as an indication to the upper layer. Also, skip average level computation if signal is invalid. Signed-off-by: Jean Pierre TOSONI <jp.tosoni@acksys.fr> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-21mac80211: add ieee80211_hw flag for QoS NDP supportBen Caradoc-Davies
Commit 7b6ddeaf27ec ("mac80211: use QoS NDP for AP probing") added an argument qos_ok to ieee80211_nullfunc_get to support QoS NDP. Despite the claim in the commit log "Change all the drivers to *not* allow QoS NDP for now, even though it looks like most of them should be OK with that", this commit enables QoS NDP in response to beacons (see change to mlme.c:ieee80211_send_nullfunc), causing ath9k_htc to lose IP connectivity. See: https://patchwork.kernel.org/patch/10241109/ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891060 Introduce a hardware flag to allow such buggy drivers to override the correct default behaviour of mac80211 of sending QoS NDP packets. Signed-off-by: Ben Caradoc-Davies <ben@transient.nz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-20svcrdma: Clean up rdma_build_arg_xdrChuck Lever
Clean up: The value of the byte_count parameter is already passed to rdma_build_arg_xdr as part of the svc_rdma_op_ctxt structure. Further, without the parameter called "byte_count" there is no need to have the abbreviated "bc" automatic variable. "bc" can now be called something more intuitive. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-20svcrdma: Consult max_qp_init_rd_atom when accepting connectionsChuck Lever
The target needs to return the lesser of the client's Inbound RDMA Read Queue Depth (IRD), provided in the connection parameters, and the local device's Outbound RDMA Read Queue Depth (ORD). The latter limit is max_qp_init_rd_atom, not max_qp_rd_atom. The svcrdma_ord value caps the ORD value for iWARP transports, which do not exchange ORD/IRD values at connection time. Since no other Linux kernel RDMA-enabled storage target sees fit to provide this cap, I'm removing it here too. initiator_depth is a u8, so ensure the computed ORD value does not overflow that field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-20svcrdma: Use pr_err to report Receive errorsChuck Lever
Clean up: Other completion handlers use pr_err, not pr_warn. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-20ipv6: old_dport should be a __be16 in __ip6_datagram_connect()Stefano Brivio
Fixes: 2f987a76a977 ("net: ipv6: keep sk status consistent after datagram connect failure") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20netfilter: ebtables: add support for matching IGMP typeMatthias Schiffer
We already have ICMPv6 type/code matches (which can be used to distinguish different types of MLD packets). Add support for IPv4 IGMP matches in the same way. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>