summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
2020-01-15bpf: Sockmap, ensure sock lock held during tear downJohn Fastabend
The sock_map_free() and sock_hash_free() paths used to delete sockmap and sockhash maps walk the maps and destroy psock and bpf state associated with the socks in the map. When done the socks no longer have BPF programs attached and will function normally. This can happen while the socks in the map are still "live" meaning data may be sent/received during the walk. Currently, though we don't take the sock_lock when the psock and bpf state is removed through this path. Specifically, this means we can be writing into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc. while they are also being called from the networking side. This is not safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we believed it was safe. Further its not clear to me its even a good idea to try and do this on "live" sockets while networking side might also be using the socket. Instead of trying to reason about using the socks from both sides lets realize that every use case I'm aware of rarely deletes maps, in fact kubernetes/Cilium case builds map at init and never tears it down except on errors. So lets do the simple fix and grab sock lock. This patch wraps sock deletes from maps in sock lock and adds some annotations so we catch any other cases easier. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
2020-01-15Merge tag 'batadv-next-for-davem-20200114' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - fix typo and kerneldocs, by Sven Eckelmann - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer - silence some endian sparse warnings by adding annotations, by Sven Eckelmann - Update copyright years to 2020, by Sven Eckelmann - Disable deprecated sysfs configuration by default, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'batadv-net-for-davem-20200114' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix DAT candidate selection on little endian systems, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15tcp: fix marked lost packets not being retransmittedPengcheng Yang
When the packet pointed to by retransmit_skb_hint is unlinked by ACK, retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). If packet loss is detected at this time, retransmit_skb_hint will be set to point to the current packet loss in tcp_verify_retransmit_hint(), then the packets that were previously marked lost but not retransmitted due to the restriction of cwnd will be skipped and cannot be retransmitted. To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can be reset only after all marked lost packets are retransmitted (retrans_out >= lost_out), otherwise we need to traverse from tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). Packetdrill to demonstrate: // Disable RACK and set max_reordering to keep things simple 0 `sysctl -q net.ipv4.tcp_recovery=0` +0 `sysctl -q net.ipv4.tcp_max_reordering=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 8 data segments +0 write(4, ..., 8000) = 8000 +0 > P. 1:8001(8000) ack 1 // Enter recovery and 1:3001 is marked lost +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 +0 > . 1:1001(1000) ack 1 // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> // Now retransmit_skb_hint points to 4001:5001 which is now marked lost // BUG: 2001:3001 was not retransmitted +0 > . 2001:3001(1000) ack 1 Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Bluetooth: Make use of __check_timeout on hci_sched_leLuiz Augusto von Dentz
This reuse __check_timeout on hci_sched_le following the same logic used hci_sched_acl. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-15Bluetooth: monitor: Add support for ISO packetsLuiz Augusto von Dentz
This enables passing ISO packets to the monitor socket. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-15Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.Alain Michaud
MGMT command is added to receive the list of blocked keys from user-space. The list is used to: 1) Block keys from being distributed by the device during the ke distribution phase of SMP. 2) Filter out any keys that were previously saved so they are no longer used. Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-15xsk: Support allocations of large umemsMagnus Karlsson
When registering a umem area that is sufficiently large (>1G on an x86), kmalloc cannot be used to allocate one of the internal data structures, as the size requested gets too large. Use kvmalloc instead that falls back on vmalloc if the allocation is too large for kmalloc. Also add accounting for this structure as it is triggered by a user space action (the XDP_UMEM_REG setsockopt) and it is by far the largest structure of kernel allocated memory in xsk. Reported-by: Ryan Goodfellow <rgoodfel@isi.edu> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1578995365-7050-1-git-send-email-magnus.karlsson@intel.com
2020-01-15net: bridge: vlan: notify on vlan add/delete/change flagsNikolay Aleksandrov
Now that we can notify, send a notification on add/del or change of flags. Notifications are also compressed when possible to reduce their number and relieve user-space of extra processing, due to that we have to manually notify after each add/del in order to avoid double notifications. We try hard to notify only about the vlans which actually changed, thus a single command can result in multiple notifications about disjoint ranges if there were vlans which didn't change inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtnetlink group and notify supportNikolay Aleksandrov
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm range supportNikolay Aleksandrov
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add del rtm message supportNikolay Aleksandrov
Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to map DELVLAN to DELLINK and register the handler. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add new rtm message supportNikolay Aleksandrov
Add initial RTM_NEWVLAN support which can only create vlans, operating similar to the current br_afspec(). We will use it later to also change per-vlan options. Old-style (flag-based) vlan ranges are not allowed when using RTM messages, we will introduce vlan ranges later via a new nested attribute which would allow us to have all the information about a range encapsulated into a single nl attribute. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm definitions and dump supportNikolay Aleksandrov
This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: netlink: add extack error messages when processing vlansNikolay Aleksandrov
Add extack messages on vlan processing errors. We need to move the flags missing check after the "last" check since we may have "last" set but lack a range end flag in the next entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add helpers to check for vlan id/range validityNikolay Aleksandrov
Add helpers to check if a vlan id or range are valid. The range helper must be called when range start or end are detected. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'mac80211-for-net-2020-01-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few fixes: * -O3 enablement fallout, thanks to Arnd who ran this * fixes for a few leaks, thanks to Felix * channel 12 regulatory fix for custom regdomains * check for a crash reported by syzbot (NULL function is called on drivers that don't have it) * fix TKIP replay protection after setup with some APs (from Jouni) * restrict obtaining some mesh data to avoid WARN_ONs * fix deadlocks with auto-disconnect (socket owner) * fix radar detection events with multiple devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15xfrm: support output_mark for offload ESP packetsUlrich Weber
Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support. xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions. Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-15cfg80211: fix page refcount issue in A-MSDU decapFelix Fietkau
The fragments attached to a skb can be part of a compound page. In that case, page_ref_inc will increment the refcount for the wrong page. Fix this by using get_page instead, which calls page_ref_inc on the compound head and also checks for overflow. Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200113182107.20461-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: check for set_wiphy_paramsJohannes Berg
Check if set_wiphy_params is assigned and return an error if not, some drivers (e.g. virt_wifi where syzbot reported it) don't have it. Reported-by: syzbot+e8a797964a4180eb57d5@syzkaller.appspotmail.com Reported-by: syzbot+34b582cf32c1db008f8e@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in cfg80211_cqm_rssi_updateFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in nl80211_probe_mesh_linkFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix deadlocks in autodisconnect workMarkus Theil
Use methods which do not try to acquire the wdev lock themselves. Cc: stable@vger.kernel.org Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk") Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200108115536.2262-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: wext: avoid gcc -O3 warningArnd Bergmann
After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, the wext code produces a bogus warning: In function 'iw_handler_get_iwstats', inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9, inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10, inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8, inlined from 'wext_handle_ioctl': net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull] memcpy(extra, stats, sizeof(struct iw_statistics)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/include/asm/string.h:5, net/wireless/wext-core.c: In function 'wext_handle_ioctl': arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here The problem is that ioctl_standard_call() sometimes calls the handler with a NULL argument that would cause a problem for iw_handler_get_iwstats. However, iw_handler_get_iwstats never actually gets called that way. Marking that function as noinline avoids the warning and leads to slightly smaller object code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107200741.3588770-1-arnd@arndb.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15mac80211: Fix TKIP replay protection immediately after key setupJouni Malinen
TKIP replay protection was skipped for the very first frame received after a new key is configured. While this is potentially needed to avoid dropping a frame in some cases, this does leave a window for replay attacks with group-addressed frames at the station side. Any earlier frame sent by the AP using the same key would be accepted as a valid frame and the internal RSC would then be updated to the TSC from that frame. This would allow multiple previously transmitted group-addressed frames to be replayed until the next valid new group-addressed frame from the AP is received by the station. Fix this by limiting the no-replay-protection exception to apply only for the case where TSC=0, i.e., when this is for the very first frame protected using the new key, and the local RSC had not been set to a higher value when configuring the key (which may happen with GTK). Signed-off-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20200107153545.10934-1-j@w1.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: Fix radar event during another phy CACOrr Mazor
In case a radar event of CAC_FINISHED or RADAR_DETECTED happens during another phy is during CAC we might need to cancel that CAC. If we got a radar in a channel that another phy is now doing CAC on then the CAC should be canceled there. If, for example, 2 phys doing CAC on the same channels, or on comptable channels, once on of them will finish his CAC the other might need to cancel his CAC, since it is no longer relevant. To fix that the commit adds an callback and implement it in mac80211 to end CAC. This commit also adds a call to said callback if after a radar event we see the CAC is no longer relevant Signed-off-by: Orr Mazor <Orr.Mazor@tandemg.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com [slightly reformat/reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: fix enabling channel 12 for custom regulatory domainGanapathi Bhat
Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"") fixed a broken regulatory (leaving channel 12 open for AP where not permitted). Apply a similar fix to custom regulatory domain processing. Signed-off-by: Cathy Luo <xiaohua.luo@nxp.com> Signed-off-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Link: https://lore.kernel.org/r/1576836859-8945-1-git-send-email-ganapathi.bhat@nxp.com [reword commit message, fix coding style, add a comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-14ipv6: Add "offload" and "trap" indications to routesIdo Schimmel
In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Add "offload" and "trap" indications to routesIdo Schimmel
When performing L3 offload, routes and nexthops are usually programmed into two different tables in the underlying device. Therefore, the fact that a nexthop resides in hardware does not necessarily mean that all the associated routes also reside in hardware and vice-versa. While the kernel can signal to user space the presence of a nexthop in hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag for routes. In addition, the fact that a route resides in hardware does not necessarily mean that the traffic is offloaded. For example, unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap packets to the CPU so that the kernel will be able to generate the appropriate ICMP error packet. This patch adds an "offload" and "trap" indications to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias' is extended with two new fields that indicate if the route resides in hardware or not and if it is offloading traffic from the kernel or trapping packets to it. Note that the new fields are added in the 6 bytes hole and therefore the struct still fits in a single cache line [1]. Capable drivers are expected to invoke fib_alias_hw_flags_set() with the route's key in order to set the flags. The indications are dumped to user space via a new flags (i.e., 'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the ancillary header. v2: * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() [1] struct fib_alias { struct hlist_node fa_list; /* 0 16 */ struct fib_info * fa_info; /* 16 8 */ u8 fa_tos; /* 24 1 */ u8 fa_type; /* 25 1 */ u8 fa_state; /* 26 1 */ u8 fa_slen; /* 27 1 */ u32 tb_id; /* 28 4 */ s16 fa_default; /* 32 2 */ u8 offload:1; /* 34: 0 1 */ u8 trap:1; /* 34: 1 1 */ u8 unused:6; /* 34: 2 1 */ /* XXX 5 bytes hole, try to pack */ struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */ /* size: 56, cachelines: 1, members: 12 */ /* sum members: 50, holes: 1, sum holes: 5 */ /* sum bitfield members: 8 bits (1 bytes) */ /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Encapsulate function arguments in a structIdo Schimmel
fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages using the passed arguments. Currently, the function takes 11 arguments, 6 of which are attributes of the route being dumped (e.g., prefix, TOS). The next patch will need the function to also dump to user space an indication if the route is present in hardware or not. Instead of passing yet another argument, change the function to take a struct containing the different route attributes. v2: * Name last argument of fib_dump_info() * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could later be passed to fib_alias_hw_flags_set() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv4: Replace route in list before notifyingIdo Schimmel
Subsequent patches will add an offload / trap indication to routes which will signal if the route is present in hardware or not. After programming the route to the hardware, drivers will have to ask the IPv4 code to set the flags by passing the route's key. In the case of route replace, the new route is notified before it is actually inserted into the FIB alias list. This can prevent simple drivers (e.g., netdevsim) that program the route to the hardware in the same context it is notified in from being able to set the flag. Solve this by first inserting the new route to the list and rollback the operation in case the route was vetoed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Remove receive workerBjorn Andersson
Rather than enqueuing messages and scheduling a worker to deliver them to the individual sockets we can now, thanks to the previous work, move this directly into the endpoint callback. This saves us a context switch per incoming message and removes the possibility of an opportunistic suspend to happen between the message is coming from the endpoint until it ends up in the socket's receive buffer. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Make qrtr_port_lookup() use RCUBjorn Andersson
The important part of qrtr_port_lookup() wrt synchronization is that the function returns a reference counted struct qrtr_sock, or fail. As such we need only to ensure that an decrement of the object's refcount happens inbetween the finding of the object in the idr and qrtr_port_lookup()'s own increment of the object. By using RCU and putting a synchronization point after we remove the mapping from the idr, but before it can be released we achieve this - with the benefit of not having to hold the mutex in qrtr_port_lookup(). Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Migrate node lookup tree to spinlockBjorn Andersson
Move operations on the qrtr_nodes radix tree under a separate spinlock and make the qrtr_nodes tree GFP_ATOMIC, to allow operation from atomic context in a subsequent patch. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Implement outgoing flow controlBjorn Andersson
In order to prevent overconsumption of resources on the remote side QRTR implements a flow control mechanism. The mechanism works by the sender keeping track of the number of outstanding unconfirmed messages that has been transmitted to a particular node/port pair. Upon count reaching a low watermark (L) the confirm_rx bit is set in the outgoing message and when the count reaching a high watermark (H) transmission will be blocked upon the reception of a resume_tx message from the remote, that resets the counter to 0. This guarantees that there will be at most 2H - L messages in flight. Values chosen for L and H are 5 and 10 respectively. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: qrtr: Move resume-tx transmission to recvmsgBjorn Andersson
The confirm-rx bit is used to implement a per port flow control, in order to make sure that no messages are dropped due to resource exhaustion. Move the resume-tx transmission to recvmsg to only confirm messages as they are consumed by the application. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14pktgen: Allow configuration of IPv6 source address rangeNiu Xilei
Pktgen can use only one IPv6 source address from output device or src6 command setting. In pressure test we need create lots of sessions more than 65535. So add src6_min and src6_max command to set the range. Signed-off-by: Niu Xilei <niu_xilei@163.com> Changes since v3: - function set_src_in6_addr use static instead of static inline - precompute min_in6_l,min_in6_h,max_in6_h,max_in6_l in setup time Changes since v2: - reword subject line Changes since v1: - only create IPv6 source address over least significant 64 bit range Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14hv_sock: Remove the accept port restrictionSunil Muthuswamy
Currently, hv_sock restricts the port the guest socket can accept connections on. hv_sock divides the socket port namespace into two parts for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF (there are no restrictions on client port namespace). The first part (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted. The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports for the peer (host) socket, once a connection is accepted. This reservation of the port namespace is specific to hv_sock and not known by the generic vsock library (ex: af_vsock). This is problematic because auto-binds/ephemeral ports are handled by the generic vsock library and it has no knowledge of this port reservation and could allocate a port that is not compatible with hv_sock (and legitimately so). The issue hasn't surfaced so far because the auto-bind code of vsock (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and start assigning ports. That will take a large number of iterations to hit 0x7FFFFFFF. But, after the above change to randomize port selection, the issue has started coming up more frequently. There has really been no good reason to have this port reservation logic in hv_sock from the get go. Reserving a local port for peer ports is not how things are handled generally. Peer ports should reflect the peer port. This fixes the issue by lifting the port reservation, and also returns the right peer port. Since the code converts the GUID to the peer port (by using the first 4 bytes), there is a possibility of conflicts, but that seems like a reasonable risk to take, given this is limited to vsock and that only applies to all local sockets. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: mac80211: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a conversion case for the new function, keeping the flow of the existing code as intact as possible. We also switch over to using skb_mark_not_on_list instead of a null write to skb->next. Finally, this code appeared to have a memory leak in the case where header building fails before the last gso segment. In that case, the remaining segments are not freed. So this commit also adds the proper kfree_skb_list call for the remainder of the skbs. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: netfilter: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: ipv4: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: sched: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: openvswitch: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: xfrm: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is converts xfrm segment iteration to use the new function, keeping the flow of the existing code as intact as possible. One case is very straight-forward, whereas the other case has some more subtle code that likes to peak at ->next and relink skbs. By keeping the variables the same as before, we can upgrade this code with minimal surgery required. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: udp: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld
This is a straight-forward conversion case for the new function, iterating over the return value from udp_rcv_segment, which actually is a wrapper around skb_gso_segment. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_mapLi RongQing
A negative value should be returned if map->map_type is invalid although that is impossible now, but if we run into such situation in future, then xdpbuff could be leaked. Daniel Borkmann suggested: -EBADRQC should be returned to stay consistent with generic XDP for the tracepoint output and not to be confused with -EOPNOTSUPP from other locations like dev_map_enqueue() when ndo_xdp_xmit is missing and such. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com
2020-01-14netns: don't disable BHs when locking "nsid_lock"Guillaume Nault
When peernet2id() had to lock "nsid_lock" before iterating through the nsid table, we had to disable BHs, because VXLAN can call peernet2id() from the xmit path: vxlan_xmit() -> vxlan_fdb_miss() -> vxlan_fdb_notify() -> __vxlan_fdb_notify() -> vxlan_fdb_info() -> peernet2id(). Now that peernet2id() uses RCU protection, "nsid_lock" isn't used in BH context anymore. Therefore, we can safely use plain spin_lock()/spin_unlock() and let BHs run when holding "nsid_lock". Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14netns: protect netns ID lookups with RCUGuillaume Nault
__peernet2id() can be protected by RCU as it only calls idr_for_each(), which is RCU-safe, and never modifies the nsid table. rtnl_net_dumpid() can also do lockless lookups. It does two nested idr_for_each() calls on nsid tables (one direct call and one indirect call because of rtnl_net_dumpid_one() calling __peernet2id()). The netnsid tables are never updated. Therefore it is safe to not take the nsid_lock and run within an RCU-critical section instead. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14netns: Remove __peernet2id_alloc()Guillaume Nault
__peernet2id_alloc() was used for both plain lookups and for netns ID allocations (depending the value of '*alloc'). Let's separate lookups from allocations instead. That is, integrate the lookup code into __peernet2id() and make peernet2id_alloc() responsible for allocating new netns IDs when necessary. This makes it clear that __peernet2id() doesn't modify the idr and prepares the code for lockless lookups. Also, mark the 'net' argument of __peernet2id() as 'const', since we're modifying this line. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>