summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-01-15Merge tag 'batadv-next-for-davem-20200114' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - fix typo and kerneldocs, by Sven Eckelmann - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer - silence some endian sparse warnings by adding annotations, by Sven Eckelmann - Update copyright years to 2020, by Sven Eckelmann - Disable deprecated sysfs configuration by default, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'batadv-net-for-davem-20200114' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix DAT candidate selection on little endian systems, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15tcp: fix marked lost packets not being retransmittedPengcheng Yang
When the packet pointed to by retransmit_skb_hint is unlinked by ACK, retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). If packet loss is detected at this time, retransmit_skb_hint will be set to point to the current packet loss in tcp_verify_retransmit_hint(), then the packets that were previously marked lost but not retransmitted due to the restriction of cwnd will be skipped and cannot be retransmitted. To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can be reset only after all marked lost packets are retransmitted (retrans_out >= lost_out), otherwise we need to traverse from tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). Packetdrill to demonstrate: // Disable RACK and set max_reordering to keep things simple 0 `sysctl -q net.ipv4.tcp_recovery=0` +0 `sysctl -q net.ipv4.tcp_max_reordering=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 8 data segments +0 write(4, ..., 8000) = 8000 +0 > P. 1:8001(8000) ack 1 // Enter recovery and 1:3001 is marked lost +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 +0 > . 1:1001(1000) ack 1 // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> // Now retransmit_skb_hint points to 4001:5001 which is now marked lost // BUG: 2001:3001 was not retransmitted +0 > . 2001:3001(1000) ack 1 Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Bluetooth: Make use of __check_timeout on hci_sched_leLuiz Augusto von Dentz
This reuse __check_timeout on hci_sched_le following the same logic used hci_sched_acl. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-15Bluetooth: monitor: Add support for ISO packetsLuiz Augusto von Dentz
This enables passing ISO packets to the monitor socket. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-15Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.Alain Michaud
MGMT command is added to receive the list of blocked keys from user-space. The list is used to: 1) Block keys from being distributed by the device during the ke distribution phase of SMP. 2) Filter out any keys that were previously saved so they are no longer used. Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-15xsk: Support allocations of large umemsMagnus Karlsson
When registering a umem area that is sufficiently large (>1G on an x86), kmalloc cannot be used to allocate one of the internal data structures, as the size requested gets too large. Use kvmalloc instead that falls back on vmalloc if the allocation is too large for kmalloc. Also add accounting for this structure as it is triggered by a user space action (the XDP_UMEM_REG setsockopt) and it is by far the largest structure of kernel allocated memory in xsk. Reported-by: Ryan Goodfellow <rgoodfel@isi.edu> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1578995365-7050-1-git-send-email-magnus.karlsson@intel.com
2020-01-15SUNRPC: Remove broken gss_mech_list_pseudoflavors()Trond Myklebust
Remove gss_mech_list_pseudoflavors() and its callers. This is part of an unused API, and could leak an RCU reference if it were ever called. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: DMA map rr_rdma_buf as each rpcrdma_rep is createdChuck Lever
Clean up: This simplifies the logic in rpcrdma_post_recvs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Destroy reps from previous connection instanceChuck Lever
To safely get rid of all rpcrdma_reps from a particular connection instance, xprtrdma has to wait until each of those reps is finished being used. A rep may be backing the rq_rcv_buf of an RPC that has just completed, for example. Since it is safe to invoke rpcrdma_rep_destroy() only in the Receive completion handler, simply mark reps remaining in the rb_all_reps list after the transport is drained. These will then be deleted as rpcrdma_post_recvs pulls them off the rep free list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Destroy rpcrdma_rep when Receive is flushedChuck Lever
This reduces the hardware and memory footprint of an unconnected transport. At some point in the future, transport reconnect will allow resolving the destination IP address through a different device. The current change enables reps for the new connection to be allocated on whichever NUMA node the new device affines to after a reconnect. Note that this does not destroy _all_ the transport's reps... there will be a few that are still part of a running RPC completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Allocate and map transport header buffers at connect timeChuck Lever
Currently the underlying RDMA device is chosen at transport set-up time. But it will soon be at connect time instead. The maximum size of a transport header is based on device capabilities. Thus transport header buffers have to be allocated _after_ the underlying device has been chosen (via address and route resolution); ie, in the connect worker. Thus, move the allocation of transport header buffers to the connect worker, after the point at which the underlying RDMA device has been chosen. This also means the RDMA device is available to do a DMA mapping of these buffers at connect time, instead of in the hot I/O path. Make that optimization as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Refactor frwr_is_supportedChuck Lever
Refactor: Perform the "is supported" check in rpcrdma_ep_create() instead of in rpcrdma_ia_open(). frwr_open() is where most of the logic to query device attributes is already located. The current code displays a redundant error message when the device does not support FRWR. As an additional clean-up, this patch removes the extra message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Eliminate per-transport "max pages"Chuck Lever
To support device hotplug and migrating a connection between devices of different capabilities, we have to guarantee that all in-kernel devices can support the same max NFS payload size (1 megabyte). This means that possibly one or two in-tree devices are no longer supported for NFS/RDMA because they cannot support 1MB rsize/wsize. The only one I confirmed was cxgb3, but it has already been removed from the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Refactor initialization of ep->rep_max_requestsChuck Lever
Clean up: there is no need to keep two copies of the same value. Also, in subsequent patches, rpcrdma_ep_create() will be called in the connect worker rather than at set-up time. Minor fix: Initialize the transport's sendctx to the value based on the capabilities of the underlying device, not the maximum setting. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Make sendctx queue lifetime the same as connection lifetimeChuck Lever
The size of the sendctx queue depends on the value stored in ia->ri_max_send_sges. This value is determined by querying the underlying device. Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called in the connect worker rather than at transport set-up time. The underlying device will not have been chosen device set-up time. The sendctx queue will thus have to be created after the underlying device has been chosen via address and route resolution; in other words, in the connect worker. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Eliminate ri_max_send_sgesChuck Lever
Clean-up. The max_send_sge value also happens to be stored in ep->rep_attr. Let's keep just a single copy. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: constify copied structureJulia Lawall
The empty_iov structure is only copied into another structure, so make it const. The opportunity for this change was found using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: call_connect_status should handle -EPROTOChuck Lever
The xprtrdma connect logic can return -EPROTO if the underlying device or network path does not support RDMA. This can happen after a device removal/insertion. - When SOFTCONN is set, EPROTO is a permanent error. - When SOFTCONN is not set, EPROTO is treated as a temporary error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: Capture signalled RPC tasksChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15sunrpc: convert to time64_t for expiryArnd Bergmann
Using signed 32-bit types for UTC time leads to the y2038 overflow, which is what happens in the sunrpc code at the moment. This changes the sunrpc code over to use time64_t where possible. The one exception is the gss_import_v{1,2}_context() function for kerberos5, which uses 32-bit timestamps in the protocol. Here, we can at least treat the numbers as 'unsigned', which extends the range from 2038 to 2106. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15net: bridge: vlan: notify on vlan add/delete/change flagsNikolay Aleksandrov
Now that we can notify, send a notification on add/del or change of flags. Notifications are also compressed when possible to reduce their number and relieve user-space of extra processing, due to that we have to manually notify after each add/del in order to avoid double notifications. We try hard to notify only about the vlans which actually changed, thus a single command can result in multiple notifications about disjoint ranges if there were vlans which didn't change inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtnetlink group and notify supportNikolay Aleksandrov
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm range supportNikolay Aleksandrov
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add del rtm message supportNikolay Aleksandrov
Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to map DELVLAN to DELLINK and register the handler. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add new rtm message supportNikolay Aleksandrov
Add initial RTM_NEWVLAN support which can only create vlans, operating similar to the current br_afspec(). We will use it later to also change per-vlan options. Old-style (flag-based) vlan ranges are not allowed when using RTM messages, we will introduce vlan ranges later via a new nested attribute which would allow us to have all the information about a range encapsulated into a single nl attribute. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm definitions and dump supportNikolay Aleksandrov
This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: netlink: add extack error messages when processing vlansNikolay Aleksandrov
Add extack messages on vlan processing errors. We need to move the flags missing check after the "last" check since we may have "last" set but lack a range end flag in the next entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add helpers to check for vlan id/range validityNikolay Aleksandrov
Add helpers to check if a vlan id or range are valid. The range helper must be called when range start or end are detected. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'mac80211-for-net-2020-01-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few fixes: * -O3 enablement fallout, thanks to Arnd who ran this * fixes for a few leaks, thanks to Felix * channel 12 regulatory fix for custom regdomains * check for a crash reported by syzbot (NULL function is called on drivers that don't have it) * fix TKIP replay protection after setup with some APs (from Jouni) * restrict obtaining some mesh data to avoid WARN_ONs * fix deadlocks with auto-disconnect (socket owner) * fix radar detection events with multiple devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15xfrm: support output_mark for offload ESP packetsUlrich Weber
Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support. xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions. Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-15cfg80211: fix page refcount issue in A-MSDU decapFelix Fietkau
The fragments attached to a skb can be part of a compound page. In that case, page_ref_inc will increment the refcount for the wrong page. Fix this by using get_page instead, which calls page_ref_inc on the compound head and also checks for overflow. Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200113182107.20461-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: check for set_wiphy_paramsJohannes Berg
Check if set_wiphy_params is assigned and return an error if not, some drivers (e.g. virt_wifi where syzbot reported it) don't have it. Reported-by: syzbot+e8a797964a4180eb57d5@syzkaller.appspotmail.com Reported-by: syzbot+34b582cf32c1db008f8e@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in cfg80211_cqm_rssi_updateFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in nl80211_probe_mesh_linkFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix deadlocks in autodisconnect workMarkus Theil
Use methods which do not try to acquire the wdev lock themselves. Cc: stable@vger.kernel.org Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk") Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200108115536.2262-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: wext: avoid gcc -O3 warningArnd Bergmann
After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, the wext code produces a bogus warning: In function 'iw_handler_get_iwstats', inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9, inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10, inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8, inlined from 'wext_handle_ioctl': net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull] memcpy(extra, stats, sizeof(struct iw_statistics)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/include/asm/string.h:5, net/wireless/wext-core.c: In function 'wext_handle_ioctl': arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here The problem is that ioctl_standard_call() sometimes calls the handler with a NULL argument that would cause a problem for iw_handler_get_iwstats. However, iw_handler_get_iwstats never actually gets called that way. Marking that function as noinline avoids the warning and leads to slightly smaller object code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107200741.3588770-1-arnd@arndb.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15mac80211: Fix TKIP replay protection immediately after key setupJouni Malinen
TKIP replay protection was skipped for the very first frame received after a new key is configured. While this is potentially needed to avoid dropping a frame in some cases, this does leave a window for replay attacks with group-addressed frames at the station side. Any earlier frame sent by the AP using the same key would be accepted as a valid frame and the internal RSC would then be updated to the TSC from that frame. This would allow multiple previously transmitted group-addressed frames to be replayed until the next valid new group-addressed frame from the AP is received by the station. Fix this by limiting the no-replay-protection exception to apply only for the case where TSC=0, i.e., when this is for the very first frame protected using the new key, and the local RSC had not been set to a higher value when configuring the key (which may happen with GTK). Signed-off-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20200107153545.10934-1-j@w1.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: Fix radar event during another phy CACOrr Mazor
In case a radar event of CAC_FINISHED or RADAR_DETECTED happens during another phy is during CAC we might need to cancel that CAC. If we got a radar in a channel that another phy is now doing CAC on then the CAC should be canceled there. If, for example, 2 phys doing CAC on the same channels, or on comptable channels, once on of them will finish his CAC the other might need to cancel his CAC, since it is no longer relevant. To fix that the commit adds an callback and implement it in mac80211 to end CAC. This commit also adds a call to said callback if after a radar event we see the CAC is no longer relevant Signed-off-by: Orr Mazor <Orr.Mazor@tandemg.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com [slightly reformat/reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: fix enabling channel 12 for custom regulatory domainGanapathi Bhat
Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"") fixed a broken regulatory (leaving channel 12 open for AP where not permitted). Apply a similar fix to custom regulatory domain processing. Signed-off-by: Cathy Luo <xiaohua.luo@nxp.com> Signed-off-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Link: https://lore.kernel.org/r/1576836859-8945-1-git-send-email-ganapathi.bhat@nxp.com [reword commit message, fix coding style, add a comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-14ipv6: Add "offload" and "trap" indications to routesIdo Schimmel
In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Add "offload" and "trap" indications to routesIdo Schimmel
When performing L3 offload, routes and nexthops are usually programmed into two different tables in the underlying device. Therefore, the fact that a nexthop resides in hardware does not necessarily mean that all the associated routes also reside in hardware and vice-versa. While the kernel can signal to user space the presence of a nexthop in hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag for routes. In addition, the fact that a route resides in hardware does not necessarily mean that the traffic is offloaded. For example, unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap packets to the CPU so that the kernel will be able to generate the appropriate ICMP error packet. This patch adds an "offload" and "trap" indications to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias' is extended with two new fields that indicate if the route resides in hardware or not and if it is offloading traffic from the kernel or trapping packets to it. Note that the new fields are added in the 6 bytes hole and therefore the struct still fits in a single cache line [1]. Capable drivers are expected to invoke fib_alias_hw_flags_set() with the route's key in order to set the flags. The indications are dumped to user space via a new flags (i.e., 'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the ancillary header. v2: * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() [1] struct fib_alias { struct hlist_node fa_list; /* 0 16 */ struct fib_info * fa_info; /* 16 8 */ u8 fa_tos; /* 24 1 */ u8 fa_type; /* 25 1 */ u8 fa_state; /* 26 1 */ u8 fa_slen; /* 27 1 */ u32 tb_id; /* 28 4 */ s16 fa_default; /* 32 2 */ u8 offload:1; /* 34: 0 1 */ u8 trap:1; /* 34: 1 1 */ u8 unused:6; /* 34: 2 1 */ /* XXX 5 bytes hole, try to pack */ struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */ /* size: 56, cachelines: 1, members: 12 */ /* sum members: 50, holes: 1, sum holes: 5 */ /* sum bitfield members: 8 bits (1 bytes) */ /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Encapsulate function arguments in a structIdo Schimmel
fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages using the passed arguments. Currently, the function takes 11 arguments, 6 of which are attributes of the route being dumped (e.g., prefix, TOS). The next patch will need the function to also dump to user space an indication if the route is present in hardware or not. Instead of passing yet another argument, change the function to take a struct containing the different route attributes. v2: * Name last argument of fib_dump_info() * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could later be passed to fib_alias_hw_flags_set() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Replace route in list before notifyingIdo Schimmel
Subsequent patches will add an offload / trap indication to routes which will signal if the route is present in hardware or not. After programming the route to the hardware, drivers will have to ask the IPv4 code to set the flags by passing the route's key. In the case of route replace, the new route is notified before it is actually inserted into the FIB alias list. This can prevent simple drivers (e.g., netdevsim) that program the route to the hardware in the same context it is notified in from being able to set the flag. Solve this by first inserting the new route to the list and rollback the operation in case the route was vetoed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Remove receive workerBjorn Andersson
Rather than enqueuing messages and scheduling a worker to deliver them to the individual sockets we can now, thanks to the previous work, move this directly into the endpoint callback. This saves us a context switch per incoming message and removes the possibility of an opportunistic suspend to happen between the message is coming from the endpoint until it ends up in the socket's receive buffer. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Make qrtr_port_lookup() use RCUBjorn Andersson
The important part of qrtr_port_lookup() wrt synchronization is that the function returns a reference counted struct qrtr_sock, or fail. As such we need only to ensure that an decrement of the object's refcount happens inbetween the finding of the object in the idr and qrtr_port_lookup()'s own increment of the object. By using RCU and putting a synchronization point after we remove the mapping from the idr, but before it can be released we achieve this - with the benefit of not having to hold the mutex in qrtr_port_lookup(). Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Migrate node lookup tree to spinlockBjorn Andersson
Move operations on the qrtr_nodes radix tree under a separate spinlock and make the qrtr_nodes tree GFP_ATOMIC, to allow operation from atomic context in a subsequent patch. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Implement outgoing flow controlBjorn Andersson
In order to prevent overconsumption of resources on the remote side QRTR implements a flow control mechanism. The mechanism works by the sender keeping track of the number of outstanding unconfirmed messages that has been transmitted to a particular node/port pair. Upon count reaching a low watermark (L) the confirm_rx bit is set in the outgoing message and when the count reaching a high watermark (H) transmission will be blocked upon the reception of a resume_tx message from the remote, that resets the counter to 0. This guarantees that there will be at most 2H - L messages in flight. Values chosen for L and H are 5 and 10 respectively. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Move resume-tx transmission to recvmsgBjorn Andersson
The confirm-rx bit is used to implement a per port flow control, in order to make sure that no messages are dropped due to resource exhaustion. Move the resume-tx transmission to recvmsg to only confirm messages as they are consumed by the application. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14pktgen: Allow configuration of IPv6 source address rangeNiu Xilei
Pktgen can use only one IPv6 source address from output device or src6 command setting. In pressure test we need create lots of sessions more than 65535. So add src6_min and src6_max command to set the range. Signed-off-by: Niu Xilei <niu_xilei@163.com> Changes since v3: - function set_src_in6_addr use static instead of static inline - precompute min_in6_l,min_in6_h,max_in6_h,max_in6_l in setup time Changes since v2: - reword subject line Changes since v1: - only create IPv6 source address over least significant 64 bit range Signed-off-by: David S. Miller <davem@davemloft.net>