summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-01-15SUNRPC: call_connect_status should handle -EPROTOChuck Lever
The xprtrdma connect logic can return -EPROTO if the underlying device or network path does not support RDMA. This can happen after a device removal/insertion. - When SOFTCONN is set, EPROTO is a permanent error. - When SOFTCONN is not set, EPROTO is treated as a temporary error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: Capture signalled RPC tasksChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15sunrpc: convert to time64_t for expiryArnd Bergmann
Using signed 32-bit types for UTC time leads to the y2038 overflow, which is what happens in the sunrpc code at the moment. This changes the sunrpc code over to use time64_t where possible. The one exception is the gss_import_v{1,2}_context() function for kerberos5, which uses 32-bit timestamps in the protocol. Here, we can at least treat the numbers as 'unsigned', which extends the range from 2038 to 2106. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15net: bridge: vlan: notify on vlan add/delete/change flagsNikolay Aleksandrov
Now that we can notify, send a notification on add/del or change of flags. Notifications are also compressed when possible to reduce their number and relieve user-space of extra processing, due to that we have to manually notify after each add/del in order to avoid double notifications. We try hard to notify only about the vlans which actually changed, thus a single command can result in multiple notifications about disjoint ranges if there were vlans which didn't change inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtnetlink group and notify supportNikolay Aleksandrov
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm range supportNikolay Aleksandrov
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add del rtm message supportNikolay Aleksandrov
Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to map DELVLAN to DELLINK and register the handler. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add new rtm message supportNikolay Aleksandrov
Add initial RTM_NEWVLAN support which can only create vlans, operating similar to the current br_afspec(). We will use it later to also change per-vlan options. Old-style (flag-based) vlan ranges are not allowed when using RTM messages, we will introduce vlan ranges later via a new nested attribute which would allow us to have all the information about a range encapsulated into a single nl attribute. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm definitions and dump supportNikolay Aleksandrov
This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: netlink: add extack error messages when processing vlansNikolay Aleksandrov
Add extack messages on vlan processing errors. We need to move the flags missing check after the "last" check since we may have "last" set but lack a range end flag in the next entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add helpers to check for vlan id/range validityNikolay Aleksandrov
Add helpers to check if a vlan id or range are valid. The range helper must be called when range start or end are detected. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'mac80211-for-net-2020-01-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few fixes: * -O3 enablement fallout, thanks to Arnd who ran this * fixes for a few leaks, thanks to Felix * channel 12 regulatory fix for custom regdomains * check for a crash reported by syzbot (NULL function is called on drivers that don't have it) * fix TKIP replay protection after setup with some APs (from Jouni) * restrict obtaining some mesh data to avoid WARN_ONs * fix deadlocks with auto-disconnect (socket owner) * fix radar detection events with multiple devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15xfrm: support output_mark for offload ESP packetsUlrich Weber
Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support. xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions. Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-15mac80211: add ieee80211_is_any_nullfunc()Thomas Pedersen
commit 08a5bdde3812 ("mac80211: consider QoS Null frames for STA_NULLFUNC_ACKED") Fixed a bug where we failed to take into account a nullfunc frame can be either non-QoS or QoS. It turns out there is at least one more bug in ieee80211_sta_tx_notify(), introduced in commit 7b6ddeaf27ec ("mac80211: use QoS NDP for AP probing"), where we forgot to check for the QoS variant and so assumed the QoS nullfunc frame never went out Fix this by adding a helper ieee80211_is_any_nullfunc() which consolidates the check for non-QoS and QoS nullfunc frames. Replace existing compound conditionals and add a couple more missing checks for QoS variant. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200114055940.18502-3-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15mac80211: fix tx status for no ack casesMarkus Theil
Before this patch, frames which where successfully transmitted without requiring acks where accounted as lost frames. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20191218142736.15843-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15mac80211: add handling for BSS colorJohn Crispin
It is now possible to propagate BSS color settings into the subsystem. Lets make mac80211 also handle them so that we can send them further down the stack into the drivers. We drop the old bss_color field and change iwlwifi to use the new he_bss_color struct. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20191217141921.8114-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15nl80211: add handling for BSS colorJohn Crispin
This patch adds the attributes, policy and parsing code to allow userland to send the info about the BSS coloring settings to the kernel. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20191217141921.8114-1-john@phrozen.org [johannes: remove the strict policy parsing, that was a misunderstanding] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix page refcount issue in A-MSDU decapFelix Fietkau
The fragments attached to a skb can be part of a compound page. In that case, page_ref_inc will increment the refcount for the wrong page. Fix this by using get_page instead, which calls page_ref_inc on the compound head and also checks for overflow. Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200113182107.20461-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: check for set_wiphy_paramsJohannes Berg
Check if set_wiphy_params is assigned and return an error if not, some drivers (e.g. virt_wifi where syzbot reported it) don't have it. Reported-by: syzbot+e8a797964a4180eb57d5@syzkaller.appspotmail.com Reported-by: syzbot+34b582cf32c1db008f8e@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in cfg80211_cqm_rssi_updateFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in nl80211_probe_mesh_linkFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix deadlocks in autodisconnect workMarkus Theil
Use methods which do not try to acquire the wdev lock themselves. Cc: stable@vger.kernel.org Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk") Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200108115536.2262-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: wext: avoid gcc -O3 warningArnd Bergmann
After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, the wext code produces a bogus warning: In function 'iw_handler_get_iwstats', inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9, inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10, inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8, inlined from 'wext_handle_ioctl': net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull] memcpy(extra, stats, sizeof(struct iw_statistics)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/include/asm/string.h:5, net/wireless/wext-core.c: In function 'wext_handle_ioctl': arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here The problem is that ioctl_standard_call() sometimes calls the handler with a NULL argument that would cause a problem for iw_handler_get_iwstats. However, iw_handler_get_iwstats never actually gets called that way. Marking that function as noinline avoids the warning and leads to slightly smaller object code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107200741.3588770-1-arnd@arndb.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15mac80211: Fix TKIP replay protection immediately after key setupJouni Malinen
TKIP replay protection was skipped for the very first frame received after a new key is configured. While this is potentially needed to avoid dropping a frame in some cases, this does leave a window for replay attacks with group-addressed frames at the station side. Any earlier frame sent by the AP using the same key would be accepted as a valid frame and the internal RSC would then be updated to the TSC from that frame. This would allow multiple previously transmitted group-addressed frames to be replayed until the next valid new group-addressed frame from the AP is received by the station. Fix this by limiting the no-replay-protection exception to apply only for the case where TSC=0, i.e., when this is for the very first frame protected using the new key, and the local RSC had not been set to a higher value when configuring the key (which may happen with GTK). Signed-off-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20200107153545.10934-1-j@w1.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: Fix radar event during another phy CACOrr Mazor
In case a radar event of CAC_FINISHED or RADAR_DETECTED happens during another phy is during CAC we might need to cancel that CAC. If we got a radar in a channel that another phy is now doing CAC on then the CAC should be canceled there. If, for example, 2 phys doing CAC on the same channels, or on comptable channels, once on of them will finish his CAC the other might need to cancel his CAC, since it is no longer relevant. To fix that the commit adds an callback and implement it in mac80211 to end CAC. This commit also adds a call to said callback if after a radar event we see the CAC is no longer relevant Signed-off-by: Orr Mazor <Orr.Mazor@tandemg.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com [slightly reformat/reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: fix enabling channel 12 for custom regulatory domainGanapathi Bhat
Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"") fixed a broken regulatory (leaving channel 12 open for AP where not permitted). Apply a similar fix to custom regulatory domain processing. Signed-off-by: Cathy Luo <xiaohua.luo@nxp.com> Signed-off-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Link: https://lore.kernel.org/r/1576836859-8945-1-git-send-email-ganapathi.bhat@nxp.com [reword commit message, fix coding style, add a comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-14ipv6: Add "offload" and "trap" indications to routesIdo Schimmel
In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Add "offload" and "trap" indications to routesIdo Schimmel
When performing L3 offload, routes and nexthops are usually programmed into two different tables in the underlying device. Therefore, the fact that a nexthop resides in hardware does not necessarily mean that all the associated routes also reside in hardware and vice-versa. While the kernel can signal to user space the presence of a nexthop in hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag for routes. In addition, the fact that a route resides in hardware does not necessarily mean that the traffic is offloaded. For example, unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap packets to the CPU so that the kernel will be able to generate the appropriate ICMP error packet. This patch adds an "offload" and "trap" indications to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias' is extended with two new fields that indicate if the route resides in hardware or not and if it is offloading traffic from the kernel or trapping packets to it. Note that the new fields are added in the 6 bytes hole and therefore the struct still fits in a single cache line [1]. Capable drivers are expected to invoke fib_alias_hw_flags_set() with the route's key in order to set the flags. The indications are dumped to user space via a new flags (i.e., 'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the ancillary header. v2: * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() [1] struct fib_alias { struct hlist_node fa_list; /* 0 16 */ struct fib_info * fa_info; /* 16 8 */ u8 fa_tos; /* 24 1 */ u8 fa_type; /* 25 1 */ u8 fa_state; /* 26 1 */ u8 fa_slen; /* 27 1 */ u32 tb_id; /* 28 4 */ s16 fa_default; /* 32 2 */ u8 offload:1; /* 34: 0 1 */ u8 trap:1; /* 34: 1 1 */ u8 unused:6; /* 34: 2 1 */ /* XXX 5 bytes hole, try to pack */ struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */ /* size: 56, cachelines: 1, members: 12 */ /* sum members: 50, holes: 1, sum holes: 5 */ /* sum bitfield members: 8 bits (1 bytes) */ /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Encapsulate function arguments in a structIdo Schimmel
fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages using the passed arguments. Currently, the function takes 11 arguments, 6 of which are attributes of the route being dumped (e.g., prefix, TOS). The next patch will need the function to also dump to user space an indication if the route is present in hardware or not. Instead of passing yet another argument, change the function to take a struct containing the different route attributes. v2: * Name last argument of fib_dump_info() * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could later be passed to fib_alias_hw_flags_set() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Replace route in list before notifyingIdo Schimmel
Subsequent patches will add an offload / trap indication to routes which will signal if the route is present in hardware or not. After programming the route to the hardware, drivers will have to ask the IPv4 code to set the flags by passing the route's key. In the case of route replace, the new route is notified before it is actually inserted into the FIB alias list. This can prevent simple drivers (e.g., netdevsim) that program the route to the hardware in the same context it is notified in from being able to set the flag. Solve this by first inserting the new route to the list and rollback the operation in case the route was vetoed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Remove receive workerBjorn Andersson
Rather than enqueuing messages and scheduling a worker to deliver them to the individual sockets we can now, thanks to the previous work, move this directly into the endpoint callback. This saves us a context switch per incoming message and removes the possibility of an opportunistic suspend to happen between the message is coming from the endpoint until it ends up in the socket's receive buffer. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Make qrtr_port_lookup() use RCUBjorn Andersson
The important part of qrtr_port_lookup() wrt synchronization is that the function returns a reference counted struct qrtr_sock, or fail. As such we need only to ensure that an decrement of the object's refcount happens inbetween the finding of the object in the idr and qrtr_port_lookup()'s own increment of the object. By using RCU and putting a synchronization point after we remove the mapping from the idr, but before it can be released we achieve this - with the benefit of not having to hold the mutex in qrtr_port_lookup(). Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Migrate node lookup tree to spinlockBjorn Andersson
Move operations on the qrtr_nodes radix tree under a separate spinlock and make the qrtr_nodes tree GFP_ATOMIC, to allow operation from atomic context in a subsequent patch. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Implement outgoing flow controlBjorn Andersson
In order to prevent overconsumption of resources on the remote side QRTR implements a flow control mechanism. The mechanism works by the sender keeping track of the number of outstanding unconfirmed messages that has been transmitted to a particular node/port pair. Upon count reaching a low watermark (L) the confirm_rx bit is set in the outgoing message and when the count reaching a high watermark (H) transmission will be blocked upon the reception of a resume_tx message from the remote, that resets the counter to 0. This guarantees that there will be at most 2H - L messages in flight. Values chosen for L and H are 5 and 10 respectively. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Move resume-tx transmission to recvmsgBjorn Andersson
The confirm-rx bit is used to implement a per port flow control, in order to make sure that no messages are dropped due to resource exhaustion. Move the resume-tx transmission to recvmsg to only confirm messages as they are consumed by the application. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14pktgen: Allow configuration of IPv6 source address rangeNiu Xilei
Pktgen can use only one IPv6 source address from output device or src6 command setting. In pressure test we need create lots of sessions more than 65535. So add src6_min and src6_max command to set the range. Signed-off-by: Niu Xilei <niu_xilei@163.com> Changes since v3: - function set_src_in6_addr use static instead of static inline - precompute min_in6_l,min_in6_h,max_in6_h,max_in6_l in setup time Changes since v2: - reword subject line Changes since v1: - only create IPv6 source address over least significant 64 bit range Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14hv_sock: Remove the accept port restrictionSunil Muthuswamy
Currently, hv_sock restricts the port the guest socket can accept connections on. hv_sock divides the socket port namespace into two parts for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF (there are no restrictions on client port namespace). The first part (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted. The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports for the peer (host) socket, once a connection is accepted. This reservation of the port namespace is specific to hv_sock and not known by the generic vsock library (ex: af_vsock). This is problematic because auto-binds/ephemeral ports are handled by the generic vsock library and it has no knowledge of this port reservation and could allocate a port that is not compatible with hv_sock (and legitimately so). The issue hasn't surfaced so far because the auto-bind code of vsock (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and start assigning ports. That will take a large number of iterations to hit 0x7FFFFFFF. But, after the above change to randomize port selection, the issue has started coming up more frequently. There has really been no good reason to have this port reservation logic in hv_sock from the get go. Reserving a local port for peer ports is not how things are handled generally. Peer ports should reflect the peer port. This fixes the issue by lifting the port reservation, and also returns the right peer port. Since the code converts the GUID to the peer port (by using the first 4 bytes), there is a possibility of conflicts, but that seems like a reasonable risk to take, given this is limited to vsock and that only applies to all local sockets. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: mac80211: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a conversion case for the new function, keeping the flow of the existing code as intact as possible. We also switch over to using skb_mark_not_on_list instead of a null write to skb->next. Finally, this code appeared to have a memory leak in the case where header building fails before the last gso segment. In that case, the remaining segments are not freed. So this commit also adds the proper kfree_skb_list call for the remainder of the skbs. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: netfilter: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: ipv4: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: sched: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: openvswitch: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: xfrm: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is converts xfrm segment iteration to use the new function, keeping the flow of the existing code as intact as possible. One case is very straight-forward, whereas the other case has some more subtle code that likes to peak at ->next and relink skbs. By keeping the variables the same as before, we can upgrade this code with minimal surgery required. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: udp: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, iterating over the return value from udp_rcv_segment, which actually is a wrapper around skb_gso_segment. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_mapLi RongQing
A negative value should be returned if map->map_type is invalid although that is impossible now, but if we run into such situation in future, then xdpbuff could be leaked. Daniel Borkmann suggested: -EBADRQC should be returned to stay consistent with generic XDP for the tracepoint output and not to be confused with -EOPNOTSUPP from other locations like dev_map_enqueue() when ndo_xdp_xmit is missing and such. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com
2020-01-14netns: don't disable BHs when locking "nsid_lock"Guillaume Nault
When peernet2id() had to lock "nsid_lock" before iterating through the nsid table, we had to disable BHs, because VXLAN can call peernet2id() from the xmit path: vxlan_xmit() -> vxlan_fdb_miss() -> vxlan_fdb_notify() -> __vxlan_fdb_notify() -> vxlan_fdb_info() -> peernet2id(). Now that peernet2id() uses RCU protection, "nsid_lock" isn't used in BH context anymore. Therefore, we can safely use plain spin_lock()/spin_unlock() and let BHs run when holding "nsid_lock". Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14netns: protect netns ID lookups with RCUGuillaume Nault
__peernet2id() can be protected by RCU as it only calls idr_for_each(), which is RCU-safe, and never modifies the nsid table. rtnl_net_dumpid() can also do lockless lookups. It does two nested idr_for_each() calls on nsid tables (one direct call and one indirect call because of rtnl_net_dumpid_one() calling __peernet2id()). The netnsid tables are never updated. Therefore it is safe to not take the nsid_lock and run within an RCU-critical section instead. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14netns: Remove __peernet2id_alloc()Guillaume Nault
__peernet2id_alloc() was used for both plain lookups and for netns ID allocations (depending the value of '*alloc'). Let's separate lookups from allocations instead. That is, integrate the lookup code into __peernet2id() and make peernet2id_alloc() responsible for allocating new netns IDs when necessary. This makes it clear that __peernet2id() doesn't modify the idr and prepares the code for lockless lookups. Also, mark the 'net' argument of __peernet2id() as 'const', since we're modifying this line. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14xprtrdma: Fix oops in Receive handler after device removalChuck Lever
Since v5.4, a device removal occasionally triggered this oops: Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219 Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page Dec 2 17:13:53 manet kernel: PGD 0 P4D 0 Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883 Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015 Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma] Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0 Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246 Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048 Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001 Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000 Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428 Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010 Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000 Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0 Dec 2 17:13:53 manet kernel: Call Trace: Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core] Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core] Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9 Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74 Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30 The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that is still pointing to the old ib_device, which has been freed. The only way that is possible is if this rpcrdma_rep was not destroyed by rpcrdma_ia_remove. Debugging showed that was indeed the case: this rpcrdma_rep was still in use by a completing RPC at the time of the device removal, and thus wasn't on the rep free list. So, it was not found by rpcrdma_reps_destroy(). The fix is to introduce a list of all rpcrdma_reps so that they all can be found when a device is removed. That list is used to perform only regbuf DMA unmapping, replacing that call to rpcrdma_reps_destroy(). Meanwhile, to prevent corruption of this list, I've moved the destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs(). rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus protecting the rb_all_reps list. Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-14xprtrdma: Fix completion wait during device removalChuck Lever
I've found that on occasion, "rmmod <dev>" will hang while if an NFS is under load. Ensure that ri_remove_done is initialized only just before the transport is woken up to force a close. This avoids the completion possibly getting initialized again while the CM event handler is waiting for a wake-up. Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>