summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-11-18mlxsw: spectrum: Expose discard counters via ethtoolShalom Toledo
Expose packets discard counters via ethtool to help with debugging. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18tun: use netdev_alloc_frag() in tun_napi_alloc_frags()Eric Dumazet
In order to cook skbs in the same way than Ethernet drivers, it is probably better to not use GFP_KERNEL, but rather use the GFP_ATOMIC and PFMEMALLOC mechanisms provided by netdev_alloc_frag(). This would allow to use tun driver even in memory stress situations, especially if swap is used over this tun channel. Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petar Penkov <peterpenkov96@gmail.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18Merge branch 'IP101GR-devicetree-based-configuration-of-SEL_INTR32'David S. Miller
Martin Blumenstingl says: ==================== IP101GR: devicetree based configuration of SEL_INTR32 The IP101GR is a 32-pin QFN package variant of the IP101G/IP101GA Ethernet PHY. Due to it's limited amount of pins the RXER (receive error) and INTR32 (interrupt) functions share pin 21. The goal of this series is: - some small cleanups in patches 3, 4 and 5 - allowing the kernel to detect IRQ floods on boards where the IP101GR is configured in RXER mode but the RXER line is configured on the host SoC as interrupt line (patch 6) - configuration of the SEL_INTR32 register so we can use the interrupt function on boards where the RXER/INTR32 pin (pin 21) is routed to one of the host SoC's interrupt inputs (patches 1, 2, 7) A use-case where this is needed is the Endless Mini (EC-100). I have tested my changes on that board. This also confirms that Heiner Kallweit's recent icplus.c PHY driver changes are working (at least on my setup). This series is based on net-next commit 7c460cf9cd1a ("net: aquantia: fix spelling mistake "specfield" -> "specified"") Changes since v1 at [0]: - collected Andrew's Reviewed-by's (thank you!) - updated description of patch #2 to explain why two properties were added instead of adding an "this is a IP101GR" property - validate that there's no conflicting configuration in patch #7 - rebased on top of latest net-next [0] https://patchwork.ozlabs.org/cover/999371/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18net: phy: icplus: allow configuring the interrupt function on IP101GRMartin Blumenstingl
The IP101GR is a 32-pin QFN package variant of the IP101G/IP101GA Ethernet PHY. Due to it's limited amount of pins the RXER (receive error) and INTR32 (interrupt) functions share pin 21. By default the PHY is configured to output the "receive error" status on pin 21. Depending on the board layout and requirements we may want to re-configure the PHY to output the interrupt signal there. The mode of pin 21 can be configured in the "Digital I/O Specific Control Register" (register 29), bit 2: - 0 = RXER function - 1 = INTR(32) function Depending on the devicetree configuration we will now: - change the mode to either ther RXER or INTR32 function - keep the SEL_INTR32 value set by the bootloader (default) if no configuration is provided (to ensure that we're not breaking existing boards) - error out if conflicting configuration is given (RXER and INTR32 mode are enabled at the same time) Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18net: phy: icplus: implement .did_interrupt for IP101A/GMartin Blumenstingl
The IP101A_G_IRQ_CONF_STATUS register has bits to detect which interrupts have fired. Implement the .did_interrupt callback to let the PHY core know whether the interrupt was for this specific PHY. This is useful for debugging interrupt problems with 32-pin IP101GR PHYs where the interrupt line is shared with the RX_ERR (receive error status) signal. The default values are: - RX_ERR is enabled by default (LOW means that there is no receive error) - the PHY's interrupt line is configured "active low" by default Without any additional changes there is a flood of interrupts if the RX_ERR/INTR32 signal is configured in RX_ERR mode (which is the default). Having a did_interrupt ensures that the PHY core returns IRQ_NONE instead of endlessly triggering the PHY state machine. Additionally the kernel will report this after a while: irq 28: nobody cared (try booting with the "irqpoll" option) Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18net: phy: icplus: rename IP101A_G_NO_IRQ to IP101A_G_IRQ_ALL_MASKMartin Blumenstingl
The datasheet uses the name "All Mask" for this bit. Change the name of our #define to be consistent with the datasheet. While here also replace the tab between the #define and IP101A_G_IRQ_ALL_MASK with a space. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18net: phy: icplus: use the BIT macro where possibleMartin Blumenstingl
This makes the code consistent by using the BIT() macro instead of manual bit-shifting for some of the fields. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18net: phy: icplus: keep all ip101a_g functions togetherMartin Blumenstingl
This simply moves ip101a_g_config_init right above ip101a_g_config_intr so all functions for the ICPlus IP101A/G PHYs are grouped together. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18dt-bindings: net: phy: add bindings for the IC Plus Corp. IP101A/G PHYsMartin Blumenstingl
The IP101A and IP101G series both have various models. Depending on the board implementation we need a special property for the IP101GR (32-pin LQFP package) PHY: pin 21 ("RXER/INTR_32") outputs the "receive error" signal by default (LOW means "normal operation", HIGH means that there's either a decoding error of the received signal or that the PHY is receiving LPI). This pin can also be switched to INTR32 mode, where the interrupt signal is routed to this pin. The other PHYs don't need this special handling because they have more pins available so the interrupt function gets a dedicated pin. This adds two properties to either select the "receive error" or "interrupt" function of pin 21. Not specifying any function means that the default set by the bootloader is used. This is required because the IP101GR cannot be differentiated between other IP101 PHYs as the PHY identification registers on all of these is 0x02430c54. The IP101G (sold as die only, without package) may suffer from the same issue depending on how it's integrated into a multi chip package by another manufacturer. If only the RXER/INTR_32 pin is routed then the users of the die-only variant may also have to explicitly configure the mode of hte RXER/INTR_32 pin. This is the reason why no "is-ip101gr" property was added. I have no evidence though which would confirm this theory - so the binding itself is independent of that. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18dt-bindings: vendor-prefix: add prefix for IC Plus Corp.Martin Blumenstingl
IC Plus Corp. has various Ethernet related products such as Ethernet transceivers, Ethernet controllers, Ethernet switches, etc. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18tg3: optionally use eth_platform_get_mac_address() to get mac addressthesven73@gmail.com
This function will try to determine the mac address via the devicetree, or via an architecture-specific method (e.g. a PROM on SPARC). The SPARC-specific code in this driver (#ifdef SPARC) did exactly this, and is therefore removed. Note that you can now specify the tg3 mac address via the devicetree, on any platform, not just SPARC: Devicetree example: (see Documentation/devicetree/bindings/pci/pci.txt) &pcie { host@0 { #address-cells = <3>; #size-cells = <2>; reg = <0 0 0 0 0>; bcm5778: bcm5778@0 { reg = <0 0 0 0 0>; mac-address = [CA 11 AB 1E 10 01]; }; }; }; Signed-off-by: Sven Van Asbroeck <svendev@arcx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18net: Add part of TCP counts explanations in snmp_counters.rstyupeng
Add explanations of some generic TCP counters, fast open related counters and TCP abort related counters and several examples. Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17Merge branch 'bcmgenet-fix-aborted-suspend'David S. Miller
Doug Berger says: ==================== net: bcmgenet: fix aborted suspend It is not enough to return an error code from the driver suspend routine. The driver must also restore the device functionality. This commit corrects the issue introduced by commit 0db55093b566 ("net: bcmgenet: return correct value 'ret' from bcmgenet_power_down") by calling the driver resume function if the suspend function returns an error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: bcmgenet: abort suspend on errorDoug Berger
If an error occurs during suspension of the driver the driver should restore the hardware configuration and return an error to force the system to resume. Fixes: 0db55093b566 ("net: bcmgenet: return correct value 'ret' from bcmgenet_power_down") Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: bcmgenet: code movementDoug Berger
This commit switches the order of bcmgenet_suspend and bcmgenet_resume in the file to prevent the need for a forward declaration in the next commit and to make the review of that commit easier. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17geneve: Initialize addr6 with memsetNathan Chancellor
Clang warns: drivers/net/geneve.c:428:29: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] struct in6_addr addr6 = { 0 }; ^ {} Rather than trying to appease the various compilers that support the kernel, use memset, which is unambiguous. Fixes: a07966447f39 ("geneve: ICMP error lookup handler") Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17Merge branch 'net-hns3-Add-vf-mtu-support'David S. Miller
Salil Mehta says: ==================== net: hns3: Add vf mtu support This patchset adds vf mtu support to HNS3 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: hns3: up/down netdev in hclge module when setting mtuYunsheng Lin
Currently netdev is down in enet module, and it is before mtu range checking in hclge module, which may be cause netdev being down unnecessarily. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: hns3: Add mtu setting support for vfYunsheng Lin
The patch adds mtu setting support for vf, currently vf and pf share the same hardware mtu setting. Mtu set by vf must be less than or equal to pf' mtu, and mtu set by pf must be greater than or equal to vf' mtu. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: hns3: Add vport alive state checking supportYunsheng Lin
Currently there is no way for pf to know if a vf device is alive or not, so PF does not know which vf to notify when reset happens, or which vf's mtu is invalid when vf and pf share the same hardware mtu setting. This patch adds vport alive state checking support, in order to support the above scenario. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: hns3: Refactor mac mtu setting related functionsYunsheng Lin
This patch refactors mac mtu setting related functions, normalizes the use of mps and mtu. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: hns3: Support two vlan header when setting mtuYunsheng Lin
This patch adds supports for two vlan header when setting mtu. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: fsl: Use device_type helpers to access the node typeRob Herring
Remove directly accessing device_node.type pointer and use the accessors instead. This will eventually allow removing the type pointer. Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17atm: Convert to using %pOFn instead of device_node.nameRob Herring
In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. Cc: Chas Williams <3chas3@gmail.com> Cc: linux-atm-general@lists.sourceforge.net Cc: netdev@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: align gnet_stats_basic_cpu structEric Dumazet
This structure is small (12 or 16 bytes depending on 64bit or 32bit kernels), but we do not want it spanning two cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: align pcpu_sw_netstats and pcpu_lstats structsEric Dumazet
Do not risk spanning these small structures on two cache lines, it is absolutely not worth it. For 32bit arches, the hint might not be enough, but we do not really care anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: aquantia: fix spelling mistake "specfield" -> "specified"Colin Ian King
There is a spelling mistake in a netdev_err message. Fix this. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net: aquantia: cleanup err handing in hw_atl_utils_fw_rpc_waitYueHaibing
'err' always be 0 in the two places. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17Merge branch 'ncsi-Allow-enabling-multiple-packages-and-channels'David S. Miller
Samuel Mendoza-Jonas says: ==================== net/ncsi: Allow enabling multiple packages & channels This series extends the NCSI driver to configure multiple packages and/or channels simultaneously. Since the RFC series this includes a few extra changes to fix areas in the driver that either made this harder or were roadblocks due to deviations from the NCSI specification. Patches 1 & 2 fix two issues where the driver made assumptions about the capabilities of the NCSI topology. Patches 3 & 4 change some internal semantics slightly to make multi-mode easier. Patch 5 introduces a cleaner way of reconfiguring the NCSI configuration and keeping track of channel states. Patch 6 implements the main multi-package/multi-channel configuration, configured via the Netlink interface. Readers who have an interesting NCSI setup - especially multi-package with HWA - please test! I think I've covered all permutations but I don't have infinite hardware to test on. Changes in v2: - Updated use of the channel lock in ncsi_reset_dev(), making the channel invisible and leaving the monitor check to ncsi_stop_channel_monitor(). - Fixed ncsi_channel_is_tx() to consider the state of channels in other packages. Changes in v3: - Fixed bisectability bug in patch 1 - Consider channels on all packages in a few places when multi-package is enabled. - Avoid doubling up reset operations, and check the current driver state before reset to let any running operations complete. - Reorganise the LSC handler slightly to avoid enabling Tx twice. Changes in v4: - Fix failover in the single-channel case - Better handle ncsi_reset_dev() entry during a current config/suspend operation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net/ncsi: Configure multi-package, multi-channel modes with failoverSamuel Mendoza-Jonas
This patch extends the ncsi-netlink interface with two new commands and three new attributes to configure multiple packages and/or channels at once, and configure specific failover modes. NCSI_CMD_SET_PACKAGE mask and NCSI_CMD_SET_CHANNEL_MASK set a whitelist of packages or channels allowed to be configured with the NCSI_ATTR_PACKAGE_MASK and NCSI_ATTR_CHANNEL_MASK attributes respectively. If one of these whitelists is set only packages or channels matching the whitelist are considered for the channel queue in ncsi_choose_active_channel(). These commands may also use the NCSI_ATTR_MULTI_FLAG to signal that multiple packages or channels may be configured simultaneously. NCSI hardware arbitration (HWA) must be available in order to enable multi-package mode. Multi-channel mode is always available. If the NCSI_ATTR_CHANNEL_ID attribute is present in the NCSI_CMD_SET_CHANNEL_MASK command the it sets the preferred channel as with the NCSI_CMD_SET_INTERFACE command. The combination of preferred channel and channel whitelist defines a primary channel and the allowed failover channels. If the NCSI_ATTR_MULTI_FLAG attribute is also present then the preferred channel is configured for Tx/Rx and the other channels are enabled only for Rx. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net/ncsi: Reset channel state in ncsi_start_dev()Samuel Mendoza-Jonas
When the NCSI driver is stopped with ncsi_stop_dev() the channel monitors are stopped and the state set to "inactive". However the channels are still configured and active from the perspective of the network controller. We should suspend each active channel but in the context of ncsi_stop_dev() the transmit queue has been or is about to be stopped so we won't have time to do so. Instead when ncsi_start_dev() is called if the NCSI topology has already been probed then call ncsi_reset_dev() to suspend any channels that were previously active. This resets the network controller to a known state, provides an up to date view of channel link state, and makes sure that mode flags such as NCSI_MODE_TX_ENABLE are properly reset. In addition to ncsi_start_dev() use ncsi_reset_dev() in ncsi-netlink.c to update the channel configuration more cleanly. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net/ncsi: Don't mark configured channels inactiveSamuel Mendoza-Jonas
The concepts of a channel being 'active' and it having link are slightly muddled in the NCSI driver. Tweak this slightly so that NCSI_CHANNEL_ACTIVE represents a channel that has been configured and enabled, and NCSI_CHANNEL_INACTIVE represents a de-configured channel. This distinction is important because a channel can be 'active' but have its link down; in this case the channel may still need to be configured so that it may receive AEN link-state-change packets. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net/ncsi: Don't deselect package in suspend if activeSamuel Mendoza-Jonas
When a package is deselected all channels of that package cease communication. If there are other channels active on the package of the suspended channel this will disable them as well, so only send a deselect-package command if no other channels are active. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net/ncsi: Probe single packages to avoid conflictSamuel Mendoza-Jonas
Currently the NCSI driver sends a select-package command to all possible packages simultaneously to discover what packages are available. However at this stage in the probe process the driver does not know if hardware arbitration is available: if it isn't then this process could cause collisions on the RMII bus when packages try to respond. Update the probe loop to probe each package one by one, and once complete check if HWA is universally supported. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17net/ncsi: Don't enable all channels when HWA availableSamuel Mendoza-Jonas
NCSI hardware arbitration allows multiple packages to be enabled at once and share the same wiring. If the NCSI driver recognises that HWA is available it unconditionally enables all packages and channels; but that is a configuration decision rather than something required by HWA. Additionally the current implementation will not failover on link events which can cause connectivity to be lost unless the interface is manually bounced. Retain basic HWA support but remove the separate configuration path to enable all channels, leaving this to be handled by a later implementation. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page sizeArjun Vynipadath
The SGE Host Page Size has nothing to do with the actual Host Page Size. It's the SGE's BAR2 Doorbell/GTS Page Size for interpreting the SGE Ingress/Egress Queue per Page values. Firmware reads all of these things and makes all the subsequent changes necessary. The Host Driver uses the SGE Host Page Size in order to properly calculate BAR2 Offsets. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17tcp: add SRTT to SCM_TIMESTAMPING_OPT_STATSYousuk Seung
Add TCP_NLA_SRTT to SCM_TIMESTAMPING_OPT_STATS that reports the smoothed round trip time in microseconds (tcp_sock.srtt_us >> 3). Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17uapi/ethtool: fix spelling errorsStephen Hemminger
Trivial spelling errors found by codespell. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17tun: Adjust on-stack tun_page initialization.David S. Miller
Instead of constantly playing with the struct initializer syntax trying to make gcc and CLang both happy, just clear it out using memset(). >> drivers/net/tun.c:2503:42: warning: Using plain integer as NULL pointer Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17tuntap: free XDP dropped packets in a batchJason Wang
Thanks to the batched XDP buffs through msg_control. Instead of calling put_page() for each page which involves a atomic operation, let's batch them by record the last page that needs to be freed and its refcnt count and free them in a batch. Testpmd(virtio-user + vhost_net) + XDP_DROP shows 3.8% improvement. Before: 4.71Mpps After : 4.89Mpps Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17vhost_net: mitigate page reference counting during page frag refillJason Wang
We do a get_page() which involves a atomic operation. This patch tries to mitigate a per packet atomic operation by maintaining a reference bias which is initially USHRT_MAX. Each time a page is got, instead of calling get_page() we decrease the bias and when we find it's time to use a new page we will decrease the bias at one time through __page_cache_drain_cache(). Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6% improvement. Before: 4.63Mpps After: 4.71Mpps Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16Merge branch 'net-sched-gred-introduce-per-virtual-queue-attributes'David S. Miller
Jakub Kicinski says: ==================== net: sched: gred: introduce per-virtual queue attributes This series updates the GRED Qdisc. The Qdisc matches nfp offload very well, but before we can offload it there are a number of improvements to make. First few patches add extack messages to the Qdisc and pass extack to netlink validation. Next a new netlink attribute group is added, to allow GRED to be extended more easily. Currently GRED passes C structures as attributes, and even an array of C structs for virtual queue configuration. User space has hard coded the expected length of that array, so adding new fields is not possible. New two-level attribute group is added: [TCA_GRED_VQ_LIST] [TCA_GRED_VQ_ENTRY] [TCA_GRED_VQ_DP] [TCA_GRED_VQ_FLAGS] [TCA_GRED_VQ_STAT_*] [TCA_GRED_VQ_ENTRY] [TCA_GRED_VQ_DP] [TCA_GRED_VQ_FLAGS] [TCA_GRED_VQ_STAT_*] [TCA_GRED_VQ_ENTRY] ... Statistics are dump only. Patch 4 switches the byte counts to be 64 bit, and patch 5 introduces the new stats attributes for dump. Patch 6 switches RED flags to be per-virtual queue, and patch 7 allows them to be dumped and set at virtual queue granularity. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: allow manipulating per-DP RED flagsJakub Kicinski
Allow users to set and dump RED flags (ECN enabled and harddrop) on per-virtual queue basis. Validation of attributes is split from changes to make sure we won't have to undo previous operations when we find out configuration is invalid. The objective is to allow changing per-Qdisc parameters without overwriting the per-vq configured flags. Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and per-Qdisc flags will always get propagated to the virtual queues. New user space which wants to make use of per-vq flags should set per-Qdisc flags to 0 and then configure per-vq flags as it sees fit. Once per-vq flags are set per-Qdisc flags can't be changed to non-zero. Vice versa - if the per-Qdisc flags are non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted or set to the same value as per-Qdisc flags. Update per-Qdisc parameters: per-Qdisc | per-VQ | result 0 | 0 | all vq flags updated 0 | non-0 | error (vq flags in use) non-0 | 0 | -- impossible -- non-0 | non-0 | all vq flags updated Update per-VQ state (flags parameter not specified): no change to flags Update per-VQ state (flags parameter set): per-Qdisc | per-VQ | result 0 | any | per-vq flags updated non-0 | 0 | -- impossible -- non-0 | non-0 | error (per-Qdisc flags in use) Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: store red flags per virtual queueJakub Kicinski
Right now ECN marking and HARD drop (the common RED flags) can only be configured for the entire Qdisc. In preparation for per-vq flags store the values in the virtual queue structure. Setting per-vq flags will only be allowed when no flags are set for the entire Qdisc. For the new flags we will also make sure undefined bits are 0. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: provide a better structured dump and expose statsJakub Kicinski
Currently all GRED's virtual queue data is dumped in a single array in a single attribute. This makes it pretty much impossible to add new fields. In order to expose more detailed stats add a new set of attributes. We can now expose the 64 bit value of bytesin and all the mark stats which were not part of the original design. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: store bytesin as a 64 bit valueJakub Kicinski
32 bit counters for bytes are not really going to last long in modern world. Make sch_gred count bytes on a 64 bit counter. It will still get truncated during dump but follow up patch will add set of new stat dump attributes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: use extack to provide more details on configuration errorsJakub Kicinski
Add extack messages to -EINVAL errors, to help users identify their mistakes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: pass extack to nla_parse_nested()Jakub Kicinski
In case netlink wants to provide parsing error pass extack to nla_parse_nested(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16net: sched: gred: separate error and non-error path in gred_change()Jakub Kicinski
We will soon want to add more code to the non-error path, separate it from the error handling flow. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16selftests: add explicit test for multiple concurrent GRO socketsPaolo Abeni
This covers for proper accounting of encap needed static keys Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>