summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-07-08coallocate socket_wq with socket itselfAl Viro
socket->wq is assign-once, set when we are initializing both struct socket it's in and struct socket_wq it points to. As the matter of fact, the only reason for separate allocation was the ability to RCU-delay freeing of socket_wq. RCU-delaying the freeing of socket itself gets rid of that need, so we can just fold struct socket_wq into the end of struct socket and simplify the life both for sock_alloc_inode() (one allocation instead of two) and for tun/tap oddballs, where we used to embed struct socket and struct socket_wq into the same structure (now - embedding just the struct socket). Note that reference to struct socket_wq in struct sock does remain a reference - that's unchanged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-09 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Lots of libbpf improvements: i) addition of new APIs to attach BPF programs to tracing entities such as {k,u}probes or tracepoints, ii) improve specification of BTF-defined maps by eliminating the need for data initialization for some of the members, iii) addition of a high-level API for setting up and polling perf buffers for BPF event output helpers, all from Andrii. 2) Add "prog run" subcommand to bpftool in order to test-run programs through the kernel testing infrastructure of BPF, from Quentin. 3) Improve verifier for BPF sockaddr programs to support 8-byte stores for user_ip6 and msg_src_ip6 members given clang tends to generate such stores, from Stanislav. 4) Enable the new BPF JIT zero-extension optimization for further riscv64 ALU ops, from Luke. 5) Fix a bpftool json JIT dump crash on powerpc, from Jiri. 6) Fix an AF_XDP race in generic XDP's receive path, from Ilya. 7) Various smaller fixes from Ilya, Yue and Arnd. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09xdp: fix race on generic receive pathIlya Maximets
Unlike driver mode, generic xdp receive could be triggered by different threads on different CPU cores at the same time leading to the fill and rx queue breakage. For example, this could happen while sending packets from two processes to the first interface of veth pair while the second part of it is open with AF_XDP socket. Need to take a lock for each generic receive to avoid race. Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08net: stmmac: enable clause 45 mdio supportKweh Hock Leong
DWMAC4 is capable to support clause 45 mdio communication. This patch enable the feature on stmmac_mdio_write() and stmmac_mdio_read() by following phy_write_mmd() and phy_read_mmd() mdiobus read write implementation format. Reviewed-by: Li, Yifan <yifan2.li@intel.com> Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: core: page_pool: add user refcnt and reintroduce page_pool_destroyIvan Khoronzhuk
Jesper recently removed page_pool_destroy() (from driver invocation) and moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to handle in-flight packets/pages. This created an asymmetry in drivers create/destroy pairs. This patch reintroduce page_pool_destroy and add page_pool user refcnt. This serves the purpose to simplify drivers error handling as driver now drivers always calls page_pool_destroy() and don't need to track if xdp_rxq_info_reg_mem_model() was unsuccessful. This could be used for a special cases where a single RX-queue (with a single page_pool) provides packets for two net_device'es, and thus needs to register the same page_pool twice with two xdp_rxq_info structures. This patch is primarily to ease API usage for drivers. The recently merged netsec driver, actually have a bug in this area, which is solved by this API change. This patch is a modified version of Ivan Khoronzhuk's original patch. Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/ Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu. 2) Support for bridge pvid matching, from wenxu. 3) Support for bridge vlan protocol matching, also from wenxu. 4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid from packet path. 5) Prefer specific family extension in nf_tables. 6) Autoload specific family extension in case it is missing. 7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera. 8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko. 9) ICMP handling for GRE encapsulation, from Julian Anastasov. 10) Remove unused parameter in nf_queue, from Florian Westphal. 11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring. 12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes public. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bpf: avoid unused variable warning in tcp_bpf_rtt()Arnd Bergmann
When CONFIG_BPF is disabled, we get a warning for an unused variable: In file included from drivers/target/target_core_device.c:26: include/net/tcp.h:2226:19: error: unused variable 'tp' [-Werror,-Wunused-variable] struct tcp_sock *tp = tcp_sk(sk); The variable is only used in one place, so it can be replaced with its value there to avoid the warning. Fixes: 23729ff23186 ("bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08bpf: allow wide (u64) aligned stores for some fields of bpf_sock_addrStanislav Fomichev
Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided that it can do a single u64 store into user_ip6[2] instead of two separate u32 ones: # 17: (18) r2 = 0x100000000000000 # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); # 19: (7b) *(u64 *)(r1 +16) = r2 # invalid bpf_context access off=16 size=8 >From the compiler point of view it does look like a correct thing to do, so let's support it on the kernel side. Credit to Andrii Nakryiko for a proper implementation of bpf_ctx_wide_store_ok. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Yonghong Song <yhs@fb.com> Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") Reported-by: kernel test robot <rong.a.chen@intel.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06Bluetooth: Add support for LE ping featureSpoorthi Ravishankar Koppad
Changes made to add HCI Write Authenticated Payload timeout command for LE Ping feature. As per the Core Specification 5.0 Volume 2 Part E Section 7.3.94, the following code changes implements HCI Write Authenticated Payload timeout command for LE Ping feature. Signed-off-by: Spoorthi Ravishankar Koppad <spoorthix.k@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-07-05net/mlx5: Kconfig, Better organize compilation flagsTariq Toukan
Always contain all acceleration functions declarations in 'accel' files, independent to the flags setting. For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs where needed. This obsoletes the need for stubs in 'fpga' files. Remove them. Also use the new flags in Makefile, to decide whether to compile TLS-specific or IPSEC-specific objects, or not. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05Merge tag 'mlx5-updates-2019-07-04-v2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-update-2019-07-04 This series adds mlx5 support for devlink fw versions query. 1) Implement the required low level firmware commands 2) Implement the devlink knobs and callbacks for fw versions query. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05net: remove unused parameter from skb_checksum_try_convertLi RongQing
the check parameter is never used Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2019-07-05 1) A lot of work to remove indirections from the xfrm code. From Florian Westphal. 2) Fix a WARN_ON with ipv6 that triggered because of a forgotten break statement. From Florian Westphal. 3) Remove xfrmi_init_net, it is not needed. From Li RongQing. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO supportwenxu
This patch allows you to match on bridge vlan protocol, eg. nft add rule bridge firewall zones counter meta ibrvproto 0x8100 Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05bridge: add br_vlan_get_proto()wenxu
This new function allows you to fetch the bridge port vlan protocol. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05netfilter: nft_meta_bridge: add NFT_META_BRI_IIFPVID supportwenxu
This patch allows you to match on the bridge port pvid, eg. nft add rule bridge firewall zones counter meta ibrpvid 10 Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05bridge: add br_vlan_get_pvid_rcu()Pablo Neira Ayuso
This new function allows you to fetch bridge pvid from packet path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2019-07-05netfilter: nft_meta: move bridge meta keys into nft_meta_bridgewenxu
Separate bridge meta key from nft_meta to meta_bridge to avoid a dependency between the bridge module and nft_meta when using the bridge API available through include/linux/if_bridge.h Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05netfilter: nf_tables: Add synproxy supportFernando Fernandez Mancera
Add synproxy support for nf_tables. This behaves like the iptables synproxy target but it is structured in a way that allows us to propose improvements in the future. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-04Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) Add the required HW definitions and structures for upcoming TLS support. 2) Add support for MCQI and MCQS hardware registers for fw version query. 3) Added hardware bits and structures definitions for sub-functions 4) Small code cleanup and improvement for PF pci driver. 5) Bluefield (ECPF) updates and refactoring for better E-Switch management on ECPF embedded CPU NIC: 5.1) Consolidate querying eswitch number of VFs 5.2) Register event handler at the correct E-Switch init stage 5.3) Setup PF's inline mode and vlan pop when the ECPF is the E-Swtich manager ( the host PF is basically a VF ). 5.4) Handle Vport UC address changes in switchdev mode. 6) Cleanup the rep and netdev reference when unloading IB rep. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> i# All conflicts fixed but you are still merging.
2019-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your *net-next* tree. There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) ======= int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device *netdev = priv->netdev; struct mlx5e_channel *c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04bonding: add an option to specify a delay between peer notificationsVincent Bernat
Currently, gratuitous ARP/ND packets are sent every `miimon' milliseconds. This commit allows a user to specify a custom delay through a new option, `peer_notif_delay'. Like for `updelay' and `downdelay', this delay should be a multiple of `miimon' to avoid managing an additional work queue. The configuration logic is copied from `updelay' and `downdelay'. However, the default value cannot be set using a module parameter: Netlink or sysfs should be used to configure this feature. When setting `miimon' to 100 and `peer_notif_delay' to 500, we can observe the 500 ms delay is respected: 20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 In bond_mii_monitor(), I have tried to keep the lock logic readable. The change is due to the fact we cannot rely on a notification to lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is only triggered once every N times, while we need to decrement the counter each time. iproute2 also needs to be updated to be able to specify this new attribute through `ip link'. Signed-off-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04net/mlx5: Add rts2rts_qp_counters_set_id field in hca capMark Zhang
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS qp modification can be used to change the counter of a QP. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-04ipvs: allow tunneling with gre encapsulationVadim Fedorenko
windows real servers can handle gre tunnels, this patch allows gre encapsulation with the tunneling method, thereby letting ipvs be load balancer for windows-based services Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-04netfilter: nf_queue: remove unused hook entries pointerFlorian Westphal
Its not used anywhere, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-04netfilter: rename nf_SYNPROXY.h to nf_synproxy.hPablo Neira Ayuso
Uppercase is a reminiscence from the iptables infrastructure, rename this header before this is included in stable kernels. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-03inet: factor out inet_send_prepare()Paolo Abeni
The same code is replicated verbatim in multiple places, and the next patches will introduce an additional user for it. Factor out a helper and use it where appropriate. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03net/mlx5: Properly name the generic WQE control fieldTariq Toukan
A generic WQE control field is used for different purposes in different cases. Use union to allow using the proper name in each case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03net/mlx5: Introduce TLS TX offload hardware bits and structuresEran Ben Elisha
Add TLS offload related IFC structs, layouts and enumerations. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()Parav Pandit
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports(). mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF vports as well. Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to more generic vport.h header file. Such exposure is not desired. Hence a mlx5_eswitch_get_total_vports() is introduced. Given that mlx5_eswitch_get_total_vports() API wants to work on const mlx5_core_dev*, change its helper functions also to accept const *dev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03net/mlx5: Expose device definitions for object eventsYishai Hadas
Expose an extra device definitions for objects events. It includes: object_type values for legacy objects and generic data header for any other object. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Report EQE data upon CQ completionYishai Hadas
Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: mlx5_core_create_cq() enhancementsYishai Hadas
Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Expose the API to register for ANY eventYishai Hadas
Expose the API to register for ANY event, mlx5_ib will be able to use this functionality for its needs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Use event mask based on device capabilitiesYishai Hadas
Use the reported device capabilities for the supported user events (i.e. affiliated and un-affiliated) to set the EQ mask. As the event mask can be up to 256 defined by 4 entries of u64 change the applicable code to work accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03bpf: add icsk_retransmits to bpf_tcp_sockStanislav Fomichev
Add some inet_connection_sock fields to bpf_tcp_sock that might be useful for debugging congestion control issues. Cc: Eric Dumazet <edumazet@google.com> Cc: Priyaranjan Jha <priyarjha@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03bpf: add dsack_dups/delivered{, _ce} to bpf_tcp_sockStanislav Fomichev
Add more fields to bpf_tcp_sock that might be useful for debugging congestion control issues. Cc: Eric Dumazet <edumazet@google.com> Cc: Priyaranjan Jha <priyarjha@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTTStanislav Fomichev
Performance impact should be minimal because it's under a new BPF_SOCK_OPS_RTT_CB_FLAG flag that has to be explicitly enabled. Suggested-by: Eric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Priyaranjan Jha <priyarjha@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-01loopback: create blackhole net device similar to loopack.Mahesh Bandewar
Create a blackhole net device that can be used for "dead" dst entries instead of loopback device. This blackhole device differs from loopback in few aspects: (a) It's not per-ns. (b) MTU on this device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb(). and (d) since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01net: page_pool: add helper function for retrieving dma directionIlias Apalodimas
Since the dma direction is stored in page pool params, offer an API helper for driver that choose not to keep track of it locally Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01netlink: use 48 byte ctx instead of 6 signed longs for callbackJason A. Donenfeld
People are inclined to stuff random things into cb->args[n] because it looks like an array of integers. Sometimes people even put u64s in there with comments noting that a certain member takes up two slots. The horror! Really this should mirror the usage of skb->cb, which are just 48 opaque bytes suitable for casting a struct. Then people can create their usual casting macros for accessing strongly typed members of a struct. As a plus, this also gives us the same amount of space on 32bit and 64bit. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01net/mlx5: E-Switch, Consider host PF for inline mode and vlan popBodong Wang
When ECPF is the eswitch manager, host PF is treated like other VFs. Driver should do the same for inline mode and vlan pop. Add new iterators to include host PF if ECPF is the eswitch manager. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Handle host PF vport mac/guid for ECPFBodong Wang
When ECPF is eswitch manager, it has the privilege to query and configure the mac and node guid of host PF. While vport number of host PF is 0, the vport command should be issued with other_vport set in this case as the cmd is issued by ECPF vport(0xfffe). Add a specific function to query own vport mac. Low level functions are used by vport manager to query/modify any vport mac and node guid. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Reduce dependency on enabled_vfs counter and num_vfsParav Pandit
While enabling SR-IOV, PCI core already checks that if SR-IOV is already enabled, it returns failure error code. Hence, remove such duplicate check from mlx5_core driver. While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in reverse order of mlx5_device_enable_sriov(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Don't handle VF func change if host PF is disabledBodong Wang
When ECPF eswitch manager is at offloads mode, it monitors functions changed event from host PF side and acts according to the number of VFs enabled/disabled. As ECPF and host PF work in two independent hosts, it's possible that host PF OS reboots but ECPF system is still kept on and continues monitoring events from host PF. When kernel from host PF side is booting, PCI iov driver does sriov_init and compute_max_vf_buses by iterating over all valid num of VFs. This triggers FLR and generates functions changed events, even though host PF HCA is not enabled at this time. However, ECPF is not aware of this information, and still handles these events as usual. ECPF system will see massive number of reps are created, but destroyed immediately once creation finished. To eliminate this noise, a bit is added to host parameter context to indicate host PF is disabled. ECPF will not handle the VF changed event if this bit is set. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_typeHuy Nguyen
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5 device types. mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep mlx5_coredev_type in mlx5_core_dev structure. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01{IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mappingBodong Wang
In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Added MCQI and MCQS registers' description to ifcShay Agroskin
Given a fw component index, the MCQI register allows us to query this component's information (e.g. its version and capabilities). Given a fw component index, the MCQS register allows us to query the status of a fw component, including its type and state (e.g. PRESET/IN_USE). It can be used to find the index of a component of a specific type, by sequentially increasing the component index, and querying each time the type of the returned component. If max component index is reached, 'last_index_flag' is set by the HCA. These registers' description was added to query the running and pending fw version of the HCA. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Add hardware definitions for sub functionsParav Pandit
Update mlx5 device interface data structures for: 1. New command definitions for allocating, deallocating SF 2. Query SF partition 3. Eswitch SF fields 4. HCA CAP SF fields 5. Extend Eswitch functions command for SF Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>