summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-08-01tcp: add stat of data packet reordering eventsWei Wang
Introduce a new TCP stats to record the number of reordering events seen and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Application can use this stats to track the frequency of the reordering events in addition to the existing reordering stats which tracks the magnitude of the latest reordering event. Note: this new stats tracks reordering events triggered by ACKs, which could often be fewer than the actual number of packets being delivered out-of-order. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add dsack blocks received statsWei Wang
Introduce a new TCP stat to record the number of DSACK blocks received (RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes retransmitted statsWei Wang
Introduce a new TCP stat to record the number of bytes retransmitted (RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes sent statsWei Wang
Introduce a new TCP stat to record the number of bytes sent (RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Notify about changes to ip_forward_update_priorityPetr Machata
Drivers may make offloading decision based on whether ip_forward_update_priority is enabled or not. Therefore distribute netevent notifications to give them a chance to react to a change. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Control SKB reprioritization after forwardingPetr Machata
After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore introduce a sysctl that controls this behavior. Keep the default value at 1 to maintain backward-compatible behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: add helpers checking if socket can be bound to nonlocal addressVincent Bernat
The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind || inet->transparent" is present three times and its IPv6 counterpart is also present three times. We introduce two small helpers to characterize these tests uniformly. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31net: remove bogus RCU annotations on socket.wqChristoph Hellwig
We never use RCU protection for it, just a lot of cargo-cult rcu_deference_protects calls. Note that we do keep the kfree_rcu call for it, as the references through struct sock are RCU protected and thus might require a grace period before freeing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31xsk: don't allow umem replace at stack levelJakub Kicinski
Currently drivers have to check if they already have a umem installed for a given queue and return an error if so. Make better use of XDP_QUERY_XSK_UMEM and move this functionality to the core. We need to keep rtnl across the calls now. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31net: update real_num_rx_queues even when !CONFIG_SYSFSJakub Kicinski
We used to depend on real_num_rx_queues as a upper bound for sanity checks. For AF_XDP socket validation it's useful if the check behaves the same regardless of CONFIG_SYSFS setting. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net/tc: introduce TC_ACT_REINSERT.Paolo Abeni
This is similar TC_ACT_REDIRECT, but with a slightly different semantic: - on ingress the mirred skbs are passed to the target device network stack without any additional check not scrubbing. - the rcu-protected stats provided via the tcf_result struct are updated on error conditions. This new tcfa_action value is not exposed to the user-space and can be used only internally by clsact. v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce a new action type instead v2 -> v3: - rename the new action value TC_ACT_REINJECT, update the helper accordingly - take care of uncloned reinjected packets in XDP generic hook v3 -> v4: - renamed again the new action value (JiriP) v4 -> v5: - fix build error with !NET_CLS_ACT (kbuild bot) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30tc/act: remove unneeded RCU lock in action callbackPaolo Abeni
Each lockless action currently does its own RCU locking in ->act(). This allows using plain RCU accessor, even if the context is really RCU BH. This change drops the per action RCU lock, replace the accessors with the _bh variant, cleans up a bit the surrounding code and documents the RCU status in the relevant header. No functional nor performance change is intended. The goal of this patch is clarifying that the RCU critical section used by the tc actions extends up to the classifier's caller. v1 -> v2: - preserve rcu lock in act_bpf: it's needed by eBPF helpers, as pointed out by Daniel v3 -> v4: - fixed some typos in the commit message (JiriP) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net/sched: user-space can't set unknown tcfa_action valuesPaolo Abeni
Currently, when initializing an action, the user-space can specify and use arbitrary values for the tcfa_action field. If the value is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC. This change explicitly checks for unknown values at action creation time, and explicitly convert them to TC_ACT_UNSPEC. No functional changes are introduced, but this will allow introducing tcfa_action values not exposed to user-space in a later patch. Note: we can't use the above to hide TC_ACT_REDIRECT from user-space, as the latter is already part of uAPI. v3 -> v4: - use an helper to check for action validity (JiriP) - emit an extack for invalid actions (JiriP) v4 -> v5: - keep messages on a single line, drop net_warn (Marcelo) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: remove sock_poll_busy_flagChristoph Hellwig
Fold it into the only caller to make the code simpler and easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: remove sock_poll_busy_loopChristoph Hellwig
There is no point in hiding this logic in a helper. Also remove the useless events != 0 check and only busy loop once we know we actually have a poll method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: don not detour through struct sock to find the poll waitqueueChristoph Hellwig
For any open socket file descriptor sock->sk->sk_wq->wait will always point to sock->wq->wait. That means we can do the shorter dereference and removal a NULL check and don't have to not worry about any RCU protection. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: simplify sock_poll_waitChristoph Hellwig
The wait_address argument is always directly derived from the filp argument, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29Merge tag 'mlx5e-updates-2018-07-27' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-07-27 (Vxlan updates) This series from Gal and Saeed provides updates to mlx5 vxlan implementation. Gal, started with three cleanups to reflect the actual hardware vxlan state - reflect 4789 UDP port default addition to software database - check maximum number of vxlan UDP ports - cleanup an unused member in vxlan work Then Gal provides performance optimization by replacing the vxlan radix tree with a hash table. Measuring mlx5e_vxlan_lookup_port execution time: Radix Tree Hash Table --------------- ------------ ------------ Single Stream 161 ns 79 ns (51% improvement) Multi Stream 259 ns 136 ns (47% improvement) Measuring UDP stream packet rate, single fully utilized TX core: Radix Tree: 498,300 PPS Hash Table: 555,468 PPS (11% improvement) Next, from Saeed, vxlan refactoring to allow sharing the vxlan table between different mlx5 netdevice instances like PF and VF representors, this is done by making mlx5 vxlan interface more generic and decoupling it from PF netdevice structures and logic, then moving it into mlx5 core as a low level interface so it can be used by VF representors, which is illustrated in the last patch of the serious. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29net: report invalid mtu value via netlink extackStephen Hemminger
If an invalid MTU value is set through rtnetlink return extra error information instead of putting message in kernel log. For other cases where there is no visible API, keep the error report in the log. Example: # ip li set dev enp12s0 mtu 10000 Error: mtu greater than device maximum. # ifconfig enp12s0 mtu 10000 SIOCSIFMTU: Invalid argument # dmesg | tail -1 [ 2047.795467] enp12s0: mtu greater than device maximum Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29net: report min and max mtu network device settingsStephen Hemminger
Report the minimum and maximum MTU allowed on a device via netlink so that it can be displayed by tools like ip link. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29net: dcb: add DSCP to comment about priority selector typesJakub Kicinski
Commit ee2059819450 ("net/dcb: Add dscp to priority selector type") added a define for the new DSCP selector type created by IEEE 802.1Qcd, but missed the comment enumerating all selector types. Update the comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29route: add support for directed broadcast forwardingXin Long
This patch implements the feature described in rfc1812#section-5.3.5.2 and rfc2644. It allows the router to forward directed broadcast when sysctl bc_forwarding is enabled. Note that this feature could be done by iptables -j TEE, but it would cause some problems: - target TEE's gateway param has to be set with a specific address, and it's not flexible especially when the route wants forward all directed broadcasts. - this duplicates the directed broadcasts so this may cause side effects to applications. Besides, to keep consistent with other os router like BSD, it's also necessary to implement it in the route rx path. Note that route cache needs to be flushed when bc_forwarding is changed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29Merge tag 'linux-can-next-for-4.19-20180727' of ↵David S. Miller
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2018-01-16 this is a pull request for net-next/master consisting of 38 patches. Dan Murphy's patch fixes the path to a file in the comment of the CAN Error Message Frame Mask structure. A patch by Colin Ian King fixes a typo in the cc770 driver. The next patch is by me an sorts the Kconfigand Makefile entries of the CAN-USB driver subdir alphabetically. The patch by Jakob Unterwurzacher adds support for the UCAN USB-CAN adapter. YueHaibing's patch replaces a open coded skb_put()+memset() by skb_put_zero() in the CAN-dev infrastructure. Zhu Yi provides a patch to enable multi-queue CAN devices. Three patches by Luc Van Oostenryck fix the return value of several driver's xmit function, I contribute a patch for the a fourth driver. Fabio Estevam's patch switches the flexcan driver to SPDX identifier. Two patches by Jia-Ju Bai replace the mdelay() by a usleep_range() in the sja1000 drivers. The next 6 patches are by Anssi Hannula and refactor the xilinx CAN driver and add support for the xilinx CAN FD core. A patch by Gustavo A. R. Silva adds fallthrough annotation to the peak_usb driver. 5 patches by Stephane Grosjean for the peak CANFD driver do some cleanups and provide more improvements for further firmware releases. The remaining 13 patches are by Jimmy Assarsson and the first clean up the kvaser_usb driver, so that the later patches add support for the Kvaser USB hydra family. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net/mlx5e: Vxlan, move vxlan logic to core driverSaeed Mahameed
Move vxlan logic and objects to mlx5 core dirver. Since it going to be used from different mlx5 interfaces. e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2018-07-27net/mlx5e: Vxlan, check maximum number of UDP portsGal Pressman
The NIC has a limited number of offloaded VXLAN UDP ports (usually 4). Instead of letting the firmware fail when trying to add more ports than it can handle, let the driver check it on its own. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-27l2tp: drop ->mru from struct l2tp_sessionGuillaume Nault
This field is not used. Treat PPPIOC*MRU the same way as PPPIOC*FLAGS: "get" requests return 0, while "set" requests vadidate the user supplied pointer but discard its value. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: ignore L2TP_ATTR_VLAN_ID netlink attributeGuillaume Nault
The value of this attribute is never used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: ignore L2TP_ATTR_DATA_SEQ netlink attributeGuillaume Nault
The value of this attribute is never used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: dcb: Add priority-to-DSCP map gettersPetr Machata
On ingress, a network device such as a switch assigns to packets priority based on various criteria. Common options include interpreting PCP and DSCP fields according to user configuration. When a packet egresses the switch, a reverse process may rewrite PCP and/or DSCP values according to packet priority. The following three functions support a) obtaining a DSCP-to-priority map or vice versa, and b) finding default-priority entries in APP database. The DCB subsystem supports for APP entries a very generous M:N mapping between priorities and protocol identifiers. Understandably, several (say) DSCP values can map to the same priority. But this asymmetry holds the other way around as well--one priority can map to several DSCP values. For this reason, the following functions operate in terms of bitmaps, with ones in positions that match some APP entry. - dcb_ieee_getapp_dscp_prio_mask_map() to compute for a given netdevice a map of DSCP-to-priority-mask, which gives for each DSCP value a bitmap of priorities related to that DSCP value by APP, along the lines of dcb_ieee_getapp_mask(). - dcb_ieee_getapp_prio_dscp_mask_map() similarly to compute for a given netdevice a map from priorities to a bitmap of DSCPs. - dcb_ieee_getapp_default_prio_mask() which finds all default-priority rules for a given port in APP database, and returns a mask of priorities allowed by these default-priority rules. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: sched: don't dump chains only held by actionsJiri Pirko
In case a chain is empty and not explicitly created by a user, such chain should not exist. The only exception is if there is an action "goto chain" pointing to it. In that case, don't show the chain in the dump. Track the chain references held by actions and use them to find out if a chain should or should not be shown in chain dump. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-07-27 1) Extend the output_mark to also support the input direction and masking the mark values before applying to the skb. 2) Add a new lookup key for the upcomming xfrm interfaces. 3) Extend the xfrm lookups to match xfrm interface IDs. 4) Add virtual xfrm interfaces. The purpose of these interfaces is to overcome the design limitations that the existing VTI devices have. The main limitations that we see with the current VTI are the following: VTI interfaces are L3 tunnels with configurable endpoints. For xfrm, the tunnel endpoint are already determined by the SA. So the VTI tunnel endpoints must be either the same as on the SA or wildcards. In case VTI tunnel endpoints are same as on the SA, we get a one to one correlation between the SA and the tunnel. So each SA needs its own tunnel interface. On the other hand, we can have only one VTI tunnel with wildcard src/dst tunnel endpoints in the system because the lookup is based on the tunnel endpoints. The existing tunnel lookup won't work with multiple tunnels with wildcard tunnel endpoints. Some usecases require more than on VTI tunnel of this type, for example if somebody has multiple namespaces and every namespace requires such a VTI. VTI needs separate interfaces for IPv4 and IPv6 tunnels. So when routing to a VTI, we have to know to which address family this traffic class is going to be encapsulated. This is a lmitation because it makes routing more complex and it is not always possible to know what happens behind the VTI, e.g. when the VTI is move to some namespace. VTI works just with tunnel mode SAs. We need generic interfaces that ensures transfomation, regardless of the xfrm mode and the encapsulated address family. VTI is configured with a combination GRE keys and xfrm marks. With this we have to deal with some extra cases in the generic tunnel lookup because the GRE keys on the VTI are actually not GRE keys, the GRE keys were just reused for something else. All extensions to the VTI interfaces would require to add even more complexity to the generic tunnel lookup. So to overcome this, we developed xfrm interfaces with the following design goal: It should be possible to tunnel IPv4 and IPv6 through the same interface. No limitation on xfrm mode (tunnel, transport and beet). Should be a generic virtual interface that ensures IPsec transformation, no need to know what happens behind the interface. Interfaces should be configured with a new key that must match a new policy/SA lookup key. The lookup logic should stay in the xfrm codebase, no need to change or extend generic routing and tunnel lookups. Should be possible to use IPsec hardware offloads of the underlying interface. 5) Remove xfrm pcpu policy cache. This was added after the flowcache removal, but it turned out to make things even worse. From Florian Westphal. 6) Allow to update the set mark on SA updates. From Nathan Harold. 7) Convert some timestamps to time64_t. From Arnd Bergmann. 8) Don't check the offload_handle in xfrm code, it is an opaque data cookie for the driver. From Shannon Nelson. 9) Remove xfrmi interface ID from flowi. After this pach no generic code is touched anymore to do xfrm interface lookups. From Benedict Wong. 10) Allow to update the xfrm interface ID on SA updates. From Nathan Harold. 11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle. From YueHaibing. 12) Return more detailed errors on xfrm interface creation. From Benedict Wong. 13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR. From the kbuild test robot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27can: dev: enable multi-queue for SocketCAN devicesZhu Yi
The existing SocketCAN implementation provides alloc_candev() to allocate a CAN device using a single Tx and Rx queue. This can lead to priority inversion in case the single Tx queue is already full with low priority messages and a high priority message needs to be sent while the bus is fully loaded with medium priority messages. This problem can be solved by using the existing multi-queue support of the network subsytem. The commit makes it possible to use multi-queue in the CAN subsystem in the same way it is used in the Ethernet subsystem by adding an alloc_candev_mqs() call and accompanying macros. With this support a CAN device can use multi-queue qdisc (e.g. mqprio) to avoid the aforementioned priority inversion. The exisiting functionality of alloc_candev() is the same as before. CAN devices need to have prioritized multiple hardware queues or are able to abort waiting for arbitration to make sensible use of multi-queues. Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com> Reviewed-by: Heiko Schocher <hs@denx.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-07-27can: uapi: can.h: Fix can error class mask dir pathDan Murphy
The CAN error masks header file is in the include/uapi directory. Fix the path in the header to the correct location. Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-07-25net/smc: provide fallback reason codeKarsten Graul
Remember the fallback reason code and the peer diagnosis code for smc sockets, and provide them in smc_diag.c to the netlink interface. And add more detailed reason codes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net: phy: add helper phy_polling_modeHeiner Kallweit
Add a helper for checking whether polling is used to detect PHY status changes. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle stations tied to AP_VLANs properly during mac80211 hw reconfig. From Manikanta Pubbisetty. 2) Fix jump stack depth validation in nf_tables, from Taehee Yoo. 3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran Ben Elisha. 4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann. 5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan. 6) Fix cached netdev name leak in nf_tables, from Florian Westphal. 7) Fix memory leaks on chain rename, also from Florian Westphal. 8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk Cheng. 9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing. 10) Fix link local address handling with VRF, from David Ahern. 11) Don't clobber 'err' on a successful call to __skb_linearize() in skb_segment(). From Eric Dumazet. 12) Fix vxlan fdb notification races, from Roopa Prabhu. 13) Hash UDP fragments consistently, from Paolo Abeni. 14) If TCP receives lots of out of order tiny packets, we do really silly stuff. Make the out-of-order queue ending more robust to this kind of behavior, from Eric Dumazet. 15) Don't leak netlink dump state in nf_tables, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: axienet: Fix double deregister of mdio qmi_wwan: fix interface number for DW5821e production firmware ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull bnx2x: Fix invalid memory access in rss hash config path. net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper r8169: restore previous behavior to accept BIOS WoL settings cfg80211: never ignore user regulatory hint sock: fix sg page frag coalescing in sk_alloc_sg netfilter: nf_tables: move dumper state allocation into ->start tcp: add tcp_ooo_try_coalesce() helper tcp: call tcp_drop() from tcp_data_queue_ofo() tcp: detect malicious patterns in tcp_collapse_ofo_queue() tcp: avoid collapses in tcp_prune_queue() if possible tcp: free batches of packets in tcp_prune_ofo_queue() ip: hash fragments consistently ipv6: use fib6_info_hold_safe() when necessary can: xilinx_can: fix power management handling can: xilinx_can: fix incorrect clear of non-processed interrupts can: xilinx_can: fix RX overflow interrupt not being enabled can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting ...
2018-07-24net/sched: add skbprio schedulerNishanth Devarajan
Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets according to their skb->priority field. Under congestion, already-enqueued lower priority packets will be dropped to make space available for higher priority packets. Skbprio was conceived as a solution for denial-of-service defenses that need to route packets with different priorities as a means to overcome DoS attacks. v5 *Do not reference qdisc_dev(sch)->tx_queue_len for setting limit. Instead set default sch->limit to 64. v4 *Drop Documentation/networking/sch_skbprio.txt doc file to move it to tc man page for Skbprio, in iproute2. v3 *Drop max_limit parameter in struct skbprio_sched_data and instead use sch->limit. *Reference qdisc_dev(sch)->tx_queue_len only once, during initialisation for qdisc (previously being referenced every time qdisc changes). *Move qdisc's detailed description from in-code to Documentation/networking. *When qdisc is saturated, enqueue incoming packet first before dequeueing lowest priority packet in queue - improves usage of call stack registers. *Introduce and use overlimit stat to keep track of number of dropped packets. v2 *Use skb->priority field rather than DS field. Rename queueing discipline as SKB Priority Queue (previously Gatekeeper Priority Queue). *Queueing discipline is made classful to expose Skbprio's internal priority queues. Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com> Reviewed-by: Sachin Paryani <sachin.paryani@gmail.com> Reviewed-by: Cody Doucette <doucette@bu.edu> Reviewed-by: Michel Machado <michel@digirati.com.br> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net: phy: add GBit master / slave error detectionHeiner Kallweit
Certain PHY's have issues when operating in GBit slave mode and can be forced to master mode. Examples are RTL8211C, also the Micrel PHY driver has a DT setting to force master mode. If two such chips are link partners the autonegotiation will fail. Standard defines a self-clearing on read, latched-high bit to indicate this error. Check this bit to inform the user. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24netlink: do not store start function in netlink_cbFlorian Westphal
->start() is called once when dump is being initialized, there is no need to store it in netlink_cb. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Make sure we don't go over the maximum jump stack boundary, from Taehee Yoo. 2) Missing rcu_barrier() in hash and rbtree sets, also from Taehee. 3) Missing check to nul-node in rbtree timeout routine, from Taehee. 4) Use dev->name from flowtable to fix a memleak, from Florian. 5) Oneliner to free flowtable object on removal, from Florian. 6) Memleak in chain rename transaction, again from Florian. 7) Don't allow two chains to use the same name in the same transaction, from Florian. 8) handle DCCP SYNC/SYNCACK as invalid, this triggers an uninitialized timer in conntrack reported by syzbot, from Florian. 9) Fix leak in case netlink_dump_start() fails, from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge tag 'mac80211-for-davem-2018-07-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Only a few fixes: * always keep regulatory user hint * add missing break statement in station flags parsing * fix non-linear SKBs in port-control-over-nl80211 * reconfigure VLAN stations during HW restart ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23Merge tag 'wireless-drivers-next-for-davem-2018-07-23' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.19 The first set of patches for 4.19. Only smaller features and bug fixes, not really anything major. Also included are changes to include/linux/bitfield.h, we agreed with Johannes that it makes sense to apply them via wireless-drivers-next. Major changes: ath10k * support channel 173 * fix spectral scan for QCA9984 and QCA9888 chipsets ath6kl * add support for Dell Wireless 1537 ti wlcore * add support for runtime PM * enable runtime PM autosuspend support qtnfmac * support changing MAC address * enable source MAC address randomization support libertas * fix suspend and resume for SDIO cards mt76 * add software DFS radar pattern detector for mt76x2 based devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Extend RDS API for IPv6 supportKa-Cheong Poon
There are many data structures (RDS socket options) used by RDS apps which use a 32 bit integer to store IP address. To support IPv6, struct in6_addr needs to be used. To ensure backward compatibility, a new data structure is introduced for each of those data structures which use a 32 bit integer to represent an IP address. And new socket options are introduced to use those new structures. This means that existing apps should work without a problem with the new RDS module. For apps which want to use IPv6, those new data structures and socket options can be used. IPv4 mapped address is used to represent IPv4 address in the new data structures. v4: Revert changes to SO_RDS_TRANSPORT Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: cls_flower: propagate chain teplate creation and destruction to ↵Jiri Pirko
drivers Introduce a couple of flower offload commands in order to propagate template creation/destruction events down to device drivers. Drivers may use this information to prepare HW in an optimal way for future filter insertions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: introduce chain templatesJiri Pirko
Allow user to set a template for newly created chains. Template lock down the chain for particular classifier type/options combinations. The classifier needs to support templates, otherwise kernel would reply with error. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: introduce chain object to uapiJiri Pirko
Allow user to create, destroy, get and dump chain objects. Do that by extending rtnl commands by the chain-specific ones. User will now be able to explicitly create or destroy chains (so far this was done only automatically according the filter/act needs and refcounting). Also, the user will receive notification about any chain creation or destuction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: Avoid implicit chain 0 creationJiri Pirko
Currently, chain 0 is implicitly created during block creation. However that does not align with chain object exposure, creation and destruction api introduced later on. So make the chain 0 behave the same way as any other chain and only create it when it is needed. Since chain 0 is somehow special as the qdiscs need to hold pointer to the first chain tp, this requires to move the chain head change callback infra to the block structure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net/mlx5: FW tracer, events handlingFeras Daoud
The tracer has one event, event 0x26, with two subtypes: - Subtype 0: Ownership change - Subtype 1: Traces available An ownership change occurs in the following cases: 1- Owner releases his ownership, in this case, an event will be sent to inform others to reattempt acquire ownership. 2- Ownership was taken by a higher priority tool, in this case the owner should understand that it lost ownership, and go through tear down flow. The second subtype indicates that there are traces in the trace buffer, in this case, the driver polls the tracer buffer for new traces, parse them and prepares the messages for printing. The HW starts tracing from the first address in the tracer buffer. Driver receives an event notifying that new trace block exists. HW posts a timestamp event at the last 8B of every 256B block. Comparing the timestamp to the last handled timestamp would indicate that this is a new trace block. Once the new timestamp is detected, the entire block is considered valid. Block validation and parsing, should be done after copying the current block to a different location, in order to avoid block overwritten during processing. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23net/mlx5: FW tracer, implement tracer logicFeras Daoud
Implement FW tracer logic and registers access, initialization and cleanup flows. Initializing the tracer will be part of load one flow, as multiple PFs will try to acquire ownership but only one will succeed and will be the tracer owner. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>