summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-11-16net/smc: remove unused constantUrsula Braun
Constant SMC_CLOSE_WAIT_LISTEN_CLCSOCK_TIME is defined, but since commit 3d502067599f ("net/smc: simplify wait when closing listen socket") no longer used. Remove it. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: use rcu_barrier() on module unloadUrsula Braun
Add rcu_barrier() to make sure no RCU readers or callbacks are pending when the module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: guarantee removal of link groups in rebootUrsula Braun
When rebooting it should be guaranteed all link groups are cleaned up and freed. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: introduce bookkeeping of SMCR link groupsUrsula Braun
If the smc module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. Counters for the total number of SMCR link groups and for the total number of SMCR links per IB device are introduced. smc module unloading continues only if the total number of SMCR link groups is zero. IB device removal continues only it the total number of SMCR links per IB device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvidVladimir Oltean
This sequence of operations: ip link set dev br0 type bridge vlan_filtering 1 bridge vlan del dev swp2 vid 1 ip link set dev br0 type bridge vlan_filtering 1 ip link set dev br0 type bridge vlan_filtering 0 apparently fails with the message: [ 31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering [ 31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0) [ 31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2 [ 31.335599] ------------[ cut here ]------------ [ 31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4 [ 31.349981] br0: Commit of attribute (id=6) failed. [ 31.354890] Modules linked in: [ 31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062 [ 31.366167] Hardware name: Freescale LS1021A [ 31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14) [ 31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c) [ 31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c) [ 31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8) [ 31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4) [ 31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118) [ 31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518) [ 31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c) [ 31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60) [ 31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c) [ 31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110) [ 31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8) [ 31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4) [ 31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250) [ 31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c) [ 31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28) [ 31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0) [ 31.505122] 7fa0: 00000002 b6f2e060 00000003 beabd6a4 00000000 00000000 [ 31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4 [ 31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68 The reason is the implementation of br_get_pvid: static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg) { if (!vg) return 0; smp_rmb(); return vg->pvid; } Since VID 0 is an invalid pvid from the bridge's point of view, let's add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that doesn't really exist. Fixes: 5f33183b7fdf ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16seg6: fix skb transport_header after decap_and_validate()Andrea Mayer
in the receive path (more precisely in ip6_rcv_core()) the skb->transport_header is set to skb->network_header + sizeof(*hdr). As a consequence, after routing operations, destination input expects to find skb->transport_header correctly set to the next protocol (or extension header) that follows the network protocol. However, decap behaviors (DX*, DT*) remove the outer IPv6 and SRH extension and do not set again the skb->transport_header pointer correctly. For this reason, the patch sets the skb->transport_header to the skb->network_header + sizeof(hdr) in each DX* and DT* behavior. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16seg6: fix srh pointer in get_srh()Andrea Mayer
pskb_may_pull may change pointers in header. For this reason, it is mandatory to reload any pointer that points into skb header. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15netfilter: nf_tables: add nft_unregister_flowtable_hook()Pablo Neira Ayuso
Unbind flowtable callback if hook is unregistered. This patch is implicitly fixing the error path of nf_tables_newflowtable() and nft_flowtable_event(). Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane") Reported-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables: check if bind callback fails and unbind if hook ↵wenxu
registration fails Undo the callback binding before unregistering the existing hooks. This should also check for error of the bind setup call. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables_offload: undo updates if transaction failsPablo Neira Ayuso
The nft_flow_rule_offload_commit() function might fail after several successful commands, thus, leaving the hardware filtering policy in inconsistent state. This patch adds nft_flow_rule_offload_abort() function which undoes the updates that have been already processed if one command in this transaction fails. Hence, the hardware ruleset is left as it was before this aborted transaction. The deletion path needs to create the flow_rule object too, in case that an existing rule needs to be re-added from the abort path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables_offload: release flow_rule on error from commit pathPablo Neira Ayuso
If hardware offload commit path fails, release all flow_rule objects. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables_offload: remove reference to flow rule from deletion pathPablo Neira Ayuso
The cookie is sufficient to delete the rule from the hardware. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table: remove unnecessary parameter in flow_offload_fill_dirwenxu
The ct object is already in the flow_offload structure, remove it. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table_offload: Fix check ndo_setup_tc when setup_blockwenxu
It should check the ndo_setup_tc in the nf_flow_table_offload_setup. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15bpf: Annotate context typesAlexei Starovoitov
Annotate BPF program context types with program-side type and kernel-side type. This type information is used by the verifier. btf_get_prog_ctx_type() is used in the later patches to verify that BTF type of ctx in BPF program matches to kernel expected ctx type. For example, the XDP program type is: BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp, struct xdp_md, struct xdp_buff) That means that XDP program should be written as: int xdp_prog(struct xdp_md *ctx) { ... } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-16-ast@kernel.org
2019-11-15netfilter: Support iif matches in POSTROUTINGPhil Sutter
Instead of generally passing NULL to NF_HOOK_COND() for input device, pass skb->dev which contains input device for routed skbs. Note that iptables (both legacy and nft) reject rules with input interface match from being added to POSTROUTING chains, but nftables allows this. Cc: Eric Garver <eric@garver.life> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table_offload: add IPv6 supportPablo Neira Ayuso
Add nf_flow_rule_route_ipv6() and use it from the IPv6 and the inet flowtable type definitions. Rename the nf_flow_rule_route() function to nf_flow_rule_route_ipv4(). Adjust maximum number of actions, which now becomes 16 to leave sufficient room for the IPv6 address mangling for NAT. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table_offload: add flow_action_entry_next() and use itPablo Neira Ayuso
This function retrieves a spare action entry from the array of actions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nft_meta: use 64-bit time arithmeticArnd Bergmann
On 32-bit architectures, get_seconds() returns an unsigned 32-bit time value, which also matches the type used in the nft_meta code. This will not overflow in year 2038 as a time_t would, but it still suffers from the overflow problem later on in year 2106. Change this instance to use the time64_t type consistently and avoid the deprecated get_seconds(). The nft_meta_weekday() calculation potentially gets a little slower on 32-bit architectures, but now it has the same behavior as on 64-bit architectures and does not overflow. Fixes: 63d10e12b00d ("netfilter: nft_meta: support for time matching") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: xt_time: use time64_tArnd Bergmann
The current xt_time driver suffers from the y2038 overflow on 32-bit architectures, when the time of day calculations break. Also, on both 32-bit and 64-bit architectures, there is a problem with info->date_start/stop, which is part of the user ABI and overflows in in 2106. Fix the first issue by using time64_t and explicit calls to div_u64() and div_u64_rem(), and document the seconds issue. The explicit 64-bit division is unfortunately slower on 32-bit architectures, but doing it as unsigned lets us use the optimized division-through-multiplication path in most configurations. This should be fine, as the code already does not allow any negative time of day values. Using u32 seconds values consistently would probably also work and be a little more efficient, but that doesn't feel right as it would propagate the y2106 overflow to more place rather than fewer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15bpf: Fix race in btf_resolve_helper_id()Alexei Starovoitov
btf_resolve_helper_id() caching logic is a bit racy, since under root the verifier can verify several programs in parallel. Fix it with READ/WRITE_ONCE. Fix the type as well, since error is also recorded. Fixes: a7658e1a4164 ("bpf: Check types of arguments passed into helpers") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-15-ast@kernel.org
2019-11-15bpf: Add kernel test functions for fentry testingAlexei Starovoitov
Add few kernel functions with various number of arguments, their types and sizes for BPF trampoline testing to cover different calling conventions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-9-ast@kernel.org
2019-11-15net: openvswitch: don't call pad_packet if not necessaryTonghao Zhang
The nla_put_u16/nla_put_u32 makes sure that *attrlen is align. The call tree is that: nla_put_u16/nla_put_u32 -> nla_put attrlen = sizeof(u16) or sizeof(u32) -> __nla_put attrlen -> __nla_reserve attrlen -> skb_put(skb, nla_total_size(attrlen)) nla_total_size returns the total length of attribute including padding. Cc: Joe Stringer <joe@ovn.org> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: dsa: ocelot: add tagger for Ocelot/Felix switchesVladimir Oltean
While it is entirely possible that this tagger format is in fact more generic than just these 2 switch families, I don't have that knowledge. The Seville switch in NXP T1040 has a similar frame format, but there are enough differences (e.g. DEST field starts at bit 57 instead of 56) that calling this file tag_vitesse.c is a bit of a stretch at the moment. The frame format has been listed in a comment so that people who add support for further Vitesse switches can rework this tagger while keeping compatibility with Felix. The "ocelot" name was chosen instead of "felix" because even the Ocelot switch can act as a DSA device when it is used in NPI mode, and the Felix tagger format is almost identical. Currently it is only used for the Felix switch embedded in the NXP LS1028A chip. The ABI for this tagger should be considered "not stable" at the moment. The DSA tag is always placed before the Ethernet header and therefore, we are using the long prefix for RX tags to avoid putting the DSA master port in promiscuous mode. Once there will be an API in DSA for drivers to request DSA masters to be in promiscuous mode unconditionally, we will switch to the "no prefix" extraction frame header, which will save 16 padding bytes for each RX frame. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: immediate termination for SMCR link groupsUrsula Braun
If the SMC module is unloaded or an IB device is thrown away, the immediate link group freeing introduced for SMCD is exploited for SMCR as well. That means SMCR-specifics are added to smc_conn_kill(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: wait for tx completions before link freeingUrsula Braun
Make sure all pending work requests are completed before freeing a link. Dismiss tx pending slots already when terminating a link group to exploit termination shortcut in tx completion queue handler. And kill the completion queue tasklets after destroy of the completion queues, otherwise there is a time window for another tasklet schedule of an already killed tasklet. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: abnormal termination without orderly flagUrsula Braun
For abnormal termination issue an LLC DELETE_LINK without the orderly flag. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: no WR buffer wait for terminating link groupUrsula Braun
Avoid waiting for a free work request buffer, if the link group is already terminating. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: introduce bookkeeping of SMCD link groupsUrsula Braun
If the ism module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. A counters for the total number of SMCD link groups per ISM device is introduced. ism module unloading continues only if the total number of SMCD link groups for all ISM devices is zero. ISM device removal continues only it the total number of SMCD link groups per ISM device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: abnormal termination of SMCD link groupsUrsula Braun
A final cleanup due to SMCD device removal means immediate freeing of all link groups belonging to this device in interrupt context. This patch introduces a separate SMCD link group termination routine, which terminates all link groups of an SMCD device. This new routine smcd_terminate_all ()is reused if the smc module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: immediate termination for SMCD link groupsUrsula Braun
SMCD link group termination is called when peer signals its shutdown of its corresponding link group. For regular shutdowns no connections exist anymore. For abnormal shutdowns connections must be killed and their DMBs must be unregistered immediately. That means the SMCR method to delay the link group freeing several seconds does not fit. This patch adds immediate termination of a link group and its SMCD connections and makes sure all SMCD link group related cleanup steps are finished. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/smc: fix final cleanup sequence for SMCD devicesUrsula Braun
If peer announces shutdown, use the link group terminate worker for local cleanup of link groups and connections to terminate link group in proper context. Make sure link groups are cleaned up first before destroying the event queue of the SMCD device, because link group cleanup may raise events. Send signal shutdown only if peer has not done it already. Send socket abort or close only, if peer has not already announced shutdown. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net/tls: Fix unused function warningYueHaibing
If PROC_FS is not set, gcc warning this: net/tls/tls_proc.c:23:12: warning: 'tls_statistics_seq_show' defined but not used [-Wunused-function] Use #ifdef to guard this. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15y2038: socket: use __kernel_old_timespec instead of timespecArnd Bergmann
The 'timespec' type definition and helpers like ktime_to_timespec() or timespec64_to_timespec() should no longer be used in the kernel so we can remove them and avoid introducing y2038 issues in new code. Change the socket code that needs to pass a timespec to user space for backward compatibility to use __kernel_old_timespec instead. This type has the same layout but with a clearer defined name. Slightly reformat tcp_recv_timestamp() for consistency after the removal of timespec64_to_timespec(). Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15y2038: socket: remove timespec reference in timestampingArnd Bergmann
In order to remove the 'struct timespec' definition and the timespec64_to_timespec() helper function, change over the in-kernel definition of 'struct scm_timestamping' to use the __kernel_old_timespec replacement and open-code the assignment. Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15y2038: remove CONFIG_64BIT_TIMEArnd Bergmann
The CONFIG_64BIT_TIME option is defined on all architectures, and can be removed for simplicity now. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-14vsock: fix bind() behaviour taking care of CIDStefano Garzarella
When we are looking for a socket bound to a specific address, we also have to take into account the CID. This patch is useful with multi-transports support because it allows the binding of the same port with different CID, and it prevents a connection to a wrong socket bound to the same port, but with different CID. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: prevent transport modules unloadingStefano Garzarella
This patch adds 'module' member in the 'struct vsock_transport' in order to get/put the transport module. This prevents the module unloading while sockets are assigned to it. We increase the module refcnt when a socket is assigned to a transport, and we decrease the module refcnt when the socket is destructed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock/vmci: register vmci_transport only when VMCI guest/host are activeStefano Garzarella
To allow other transports to be loaded with vmci_transport, we register the vmci_transport as G2H or H2G only when a VMCI guest or host is active. To do that, this patch adds a callback registered in the vmci driver that will be called when the host or guest becomes active. This callback will register the vmci_transport in the VSOCK core. Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: add multi-transports supportStefano Garzarella
This patch adds the support of multiple transports in the VSOCK core. With the multi-transports support, we can use vsock with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Major changes: - vsock core module can be loaded regardless of the transports - vsock_core_init() and vsock_core_exit() are renamed to vsock_core_register() and vsock_core_unregister() - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM) to identify which directions the transport can handle and if it's support DGRAM (only vmci) - each stream socket is assigned to a transport when the remote CID is set (during the connect() or when we receive a connection request on a listener socket). The remote CID is used to decide which transport to use: - remote CID <= VMADDR_CID_HOST will use guest->host transport; - remote CID == local_cid (guest->host transport) will use guest->host transport for loopback (host->guest transports don't support loopback); - remote CID > VMADDR_CID_HOST will use host->guest transport; - listener sockets are not bound to any transports since no transport operations are done on it. In this way we can create a listener socket, also if the transports are not loaded or with VMADDR_CID_ANY to listen on all transports. - DGRAM sockets are handled as before, since only the vmci_transport provides this feature. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init()Stefano Garzarella
Remote peer is always the host, so we set VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: move vsock_insert_unbound() in the vsock_create()Stefano Garzarella
vsock_insert_unbound() was called only when 'sock' parameter of __vsock_create() was not null. This only happened when __vsock_create() was called by vsock_create(). In order to simplify the multi-transports support, this patch moves vsock_insert_unbound() at the end of vsock_create(). Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: add vsock_create_connected() called by transportsStefano Garzarella
All transports call __vsock_create() with the same parameters, most of them depending on the parent socket. In order to simplify the VSOCK core APIs exposed to the transports, this patch adds the vsock_create_connected() callable from transports to create a new socket when a connection request is received. We also unexported the __vsock_create(). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: handle buffer_size sockopts in the coreStefano Garzarella
virtio_transport and vmci_transport handle the buffer_size sockopts in a very similar way. In order to support multiple transports, this patch moves this handling in the core to allow the user to change the options also if the socket is not yet assigned to any transport. This patch also adds the '.notify_buffer_size' callback in the 'struct virtio_transport' in order to inform the transport, when the buffer_size is changed by the user. It is also useful to limit the 'buffer_size' requested (e.g. virtio transports). Acked-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: add 'struct vsock_sock *' param to vsock_core_get_transport()Stefano Garzarella
Since now the 'struct vsock_sock' object contains a pointer to the transport, this patch adds a parameter to the vsock_core_get_transport() to return the right transport assigned to the socket. This patch modifies also the virtio_transport_get_ops(), that uses the vsock_core_get_transport(), adding the 'struct vsock_sock *' parameter. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()Stefano Garzarella
We are going to add 'struct vsock_sock *' parameter to virtio_transport_get_ops(). In some cases, like in the virtio_transport_reset_no_sock(), we don't have any socket assigned to the packet received, so we can't use the virtio_transport_get_ops(). In order to allow virtio_transport_reset_no_sock() to use the '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport', we add the 'struct virtio_transport *' to it and to its caller: virtio_transport_recv_pkt(). We moved the 'vhost_transport' and 'virtio_transport' definition, to pass their address to the virtio_transport_recv_pkt(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: add 'transport' member in the struct vsock_sockStefano Garzarella
As a preparation to support multiple transports, this patch adds the 'transport' member at the 'struct vsock_sock'. This new field is initialized during the creation in the __vsock_create() function. This patch also renames the global 'transport' pointer to 'transport_single', since for now we're only supporting a single transport registered at run-time. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: remove include/linux/vm_sockets.h fileStefano Garzarella
This header file now only includes the "uapi/linux/vm_sockets.h". We can include directly it when needed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: remove vm_sockets_get_local_cid()Stefano Garzarella
vm_sockets_get_local_cid() is only used in virtio_transport_common.c. We can replace it calling the virtio_transport_get_ops() and using the get_local_cid() callback registered by the transport. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUTStefano Garzarella
The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with commit d021c344051af ("VSOCK: Introduce VM Sockets"), but it is never used in the net/vmw_vsock/vmci_transport.c. VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in net/vmw_vsock/af_vsock.c Cc: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>