summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-01net/mlx5: Cache the system image guidAlaa Hleihel
The system image guid is a read-only field which is used by the TC offloads code to determine if two mlx5 devices belong to the same ASIC while adding flows. Read this once and save it on the core device rather than querying each time an offloaded flow is added. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Allow reporting of checksum unnecessaryOr Gerlitz
Currently we practically never report checksum unnecessary, because for all IP packets we take the checksum complete path. Enable non-default runs with reprorting checksum unnecessary, using an ethtool private flag. This can be useful for performance evals and other explorations. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Enable reporting checksum unnecessary also for L3 packetsOr Gerlitz
We can report checksum unnecessary also when the L3 checksum flag on the cqe is set and there's no L4 header. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Add ethtool control of ring params to VF representorsGavi Teitz
Added ethtool control to the representors for setting and querying the ring params. Signed-off-by: Gavi Teitz <gavi@mellanox.com>
2018-10-01net/mlx5e: Enable multi-queue and RSS for VF representorsGavi Teitz
Increased the amount of channels the representors can open to be the amount of CPUs. The default amount opened remains one. Used the standard NIC netdev functions to: * Set RSS params when building the representors' params. * Setup an indirect TIR and RQT for the representors upon initialization. * Create a TTC flow table for the representors' indirect TIR (when creating the TTC table, mlx5e_set_ttc_basic_params() is not called, in order to avoid setting the inner_ttc param, which is not needed). Added ethtool control to the representors for setting and querying the amount of open channels. Additionally, included logic in the representors' ethtool set channels handler which controls a representor's vport rx rule, so that if there is one open channel the rx rule steers traffic to the representor's direct TIR, whereas if there is more than one channel, the rx rule steers traffic to the new TTC flow table. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Expose ethtool rss key size / indirection table functionsOr Gerlitz
Towards enabling RSS for the vport representors, expose the functions for querying the rss hash key size and indirection table size via ethtool. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Expose function for building RSS paramsGavi Teitz
Towards enabling RSS for the vport representors, extract the procedure for building a device's RSS params, and expose the function. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Provide explicit directive if to create inner indirect tirsOr Gerlitz
Change the driver functions that deal with creating indirect tirs to get a flag telling if inner ttc is desired. A pre-step for enabling rss on the vport representors, where inner ttc is not needed. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5: E-Switch, Provide flow dest when creating vport rx ruleGavi Teitz
Currently the destination for the representor e-switch rx rule is a TIR number. Towards changing that to potentially be a flow table, as part of enabling RSS for representors, modify the signature of the related e-switch API to get a flow destination. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Extract creation of rep's default flow ruleGavi Teitz
Cleaning up the flow of the representors' rx initialization, towards enabling RSS for the representors. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Enable stateless offloads for VF representor netdevsGavi Teitz
Enabled checksum and TSO offloads for the representors, in order to increase their performance, which is required to increase the performance of flows that cannot be offloaded. Checksum offloads contribute to a general acceleration of all traffic (to around 150%), whereas the TSO offload contributes to a prominent acceleration of the representor's TX for traffic flows with larger than MTU sized packets (to around 200%). This is the usual case for TCP streams, as the PF, which serves as the uplink representor, and the VF representors employ GRO before forwarding the packets to the representor. GRO was enabled implicitly for the representors beforehand, and is explicitly enabled here to ensure that the representors preserve the performance boost it provides (of around 200%) when working in tandem with the TSO offload by the forwardee, which is the standard case as both the PF and the VF representors employ HW TSO. The impact of these changes can be seen in the following measurements taken on a setup of a VM over a VF, connected to OVS via the VF representor, to an external host: Before current changes: TCP Throughput [Gb/s] External host to VM ~ 10.5 VM to external host ~ 23.5 With just checksum offloads enabled: TCP Throughput [Gb/s] External host to VM ~ 14.9 VM to external host ~ 28.5 With the TSO offload also enabled: TCP Throughput [Gb/s] External host to VM ~ 30.5 Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Change VF representors' RQ typeGavi Teitz
The representors' RQ size was not large enough for them to achieve high enough performance, and therefore needed to be enlarged, while suffering a minimum hit to its memory usage. To achieve this the representors RQ size was increased, and its type was changed to be a striding RQ if it is supported. Towards that goal the following changes were made: * Extracted the sequence for setting the standard netdev's RQ parmas into a function * Replaced the sequence for setting the representor's RQ params with the standard sequence The impact of this change can be seen in the following measurements taken on a setup of a VM over a VF, connected to OVS via the VF representor, to an external host: Before current change: TCP Throughput [Gb/s] VM to external host ~ 7.2 With the current change (measured with a striding RQ): TCP Throughput [Gb/s] VM to external host ~ 23.5 Each representor now consumes 2 [MB] of memory for its packet buffers. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Ethtool steering, Support masks for l3/l4 filtersOr Gerlitz
Allow using partial masks for L3 addresses and L4 ports across the place. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Set vlan masks for all offloaded TC rulesJianbo Liu
In flow steering, if asked to, the hardware matches on the first ethertype which is not vlan. It's possible to set a rule as follows, which is meant to match on untagged packet, but will match on a vlan packet: tc filter add dev eth0 parent ffff: protocol ip flower ... To avoid this for packets with single tag, we set vlan masks to tell hardware to check the tags for every matched packet. Fixes: 095b6cfd69ce ('net/mlx5e: Add TC vlan match parsing') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5: E-Switch, Fix out of bound access when setting vport rateEran Ben Elisha
The code that deals with eswitch vport bw guarantee was going beyond the eswitch vport array limit, fix that. This was pointed out by the kernel address sanitizer (KASAN). The error from KASAN log: [2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core] Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rulesAlaa Hleihel
If the peer device was already unbound, then do not attempt to modify it's resources, otherwise we will crash on dereferencing non-existing device. Fixes: 5c65c564c962 ("net/mlx5e: Support offloading TC NIC hairpin flows") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01drm/i915: Avoid compiler warning for maybe unused gu_misc_iirChris Wilson
/kisskb/src/drivers/gpu/drm/i915/i915_irq.c: warning: 'gu_misc_iir' may be used uninitialized in this function [-Wuninitialized]: => 3120:10 Silence the compiler warning by ensuring that the local variable is initialised and removing the guard that is confusing the older gcc. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: df0d28c185ad ("drm/i915/icl: GSE interrupt moves from DE_MISC to GU_MISC") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180926104718.17462-1-chris@chris-wilson.co.uk (cherry picked from commit 7a90938332d80faf973fbcffdf6e674e7b8f0914) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-10-01drm/i915: Do not redefine the has_csr parameter.Anusha Srivatsa
Let us reuse the already defined has_csr check and not redefine it. The main difference is that in effect this will flip .has_csr to 1 (via GEN9_FEATURES which GEN11_FEATURES pulls in). Suggested-by: Imre Deak <imre.deak@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107382 Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1534527210-16841-1-git-send-email-anusha.srivatsa@intel.com (cherry picked from commit da4468a1aa75457e6134127b19761b7ba62ce945) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-10-01MAINTAINERS: MIPS/LOONGSON2 ARCHITECTURE - Use the normal wildcard styleJoe Perches
Neither git nor get_maintainer understands the curly brace style. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20821/ Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips <linux-mips@linux-mips.org> Cc: LKML <linux-kernel@vger.kernel.org>
2018-10-01Merge tag 'iwlwifi-next-for-kalle-2018-09-28' of ↵Kalle Valo
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Second set of iwlwifi patches for 4.20 * TKIP implementation in new devices; * Fix for the shared antenna setting in 22000 series; * Report that we set the RU offset in HE code; * Fix some register addresses in 22000 series; * Fix one FW feature TLV that had a conflict with another value; * A couple of fixes for SoftAP mode; * Work continues for new 22560 hardware; * Some fixes in the datapath; * Some debugging and other general fixes; * Some cleanups, small improvements and other general fixes;
2018-10-01b43: fix spelling mistake "hw_registred" -> "hw_registered"Colin Ian King
Trivial fix to spelling mistake struct field name, rename it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: check for correct CHIP ID at pcie probeIgor Mitsyanko
Make sure that wifi device is of supported variant by checking it's CHIP ID before completing a probe sequence. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac: wait for FW load work to finish at PCIe removeIgor Mitsyanko
Waiting for "completion" to be set in FW load thread can not be used in case PCIe remove is called before FW load work was scheduled. Just wait for work completion instead to avoid problems. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: extract platform-independent PCIe codeIgor Mitsyanko
Extract platform-independent PCIe driver code into a separate file, and use it from platform-specific modules. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac: add missing header includes to bus.hIgor Mitsyanko
A few include directives were missing in bus.h resulting in dependency of include order in other modules. Add missing includes. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: rename platform-specific functionsIgor Mitsyanko
Rename several functions to indicate that they are platform specific. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: separate platform-independent PCIe structureIgor Mitsyanko
Move platform-independent PCIe data structure to a separate header file so it can be reused by different devices. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: pearl: rename spinlock tx0_lock to tx_lockIgor Mitsyanko
tx_lock name will later be reused when common pcie code is extracted to separate files. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: indicate pearl-specific structures by their namesIgor Mitsyanko
In preparation to extract common PCIe driver state, indicate PEARL-specific structures by their name and move them to pearl-specific source file. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: rename private Pearl PCIe state structureIgor Mitsyanko
In preparation to extract common pcie driver state into a separate structure, rename Pearl-specific state to qtnf_pcie_pearl_state and move it directly to pearl-specific PCIe source file. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: move Pearl pcie sources to pcie-specific directoryIgor Mitsyanko
In preparation to extract common qtnfmac PCIe driver sources into a separate file, move existing Pearl-specific pcie driver sources to pcie/ directory. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01qtnfmac_pcie: do not store FW name in driver state structureIgor Mitsyanko
Firmware name is only needed at probe stage, no point in keeping it in driver state structure. Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rt2800: flush and txstatus rework for rt2800mmioStanislaw Gruszka
Implement custom rt2800mmio flush routine and change txstatus routine to read TX_STA_FIFO also in the tasklet. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rt2x00: use different txstatus timeouts when flushingStanislaw Gruszka
Use different tx status timeouts for normal operation and when flushing. This increase timeout to 2s for normal operation as when there are bad radio conditions and frames are reposted many times device can not provide the status for quite long. With new timeout we can still get valid status on such bad conditions. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rt2x00: do not check for txstatus timeout every time on taskletStanislaw Gruszka
Do not check for tx status timeout everytime we perform txstatus tasklet. Perform check once per half a second. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rt2800mmio: use txdone/txstatus routines from libStanislaw Gruszka
Use usb txdone/txstatus routines (now in rt2800libc) for mmio devices. Note this also change how we handle INT_SOURCE_CSR_TX_FIFO_STATUS interrupt. Now it is disabled since IRQ routine till end of the txstatus tasklet (the same behaviour like others interrupts). Reason to do not disable this interrupt was not to miss any tx status from 16 entries FIFO register. Now, since we check for tx status timeout, we can allow to miss some tx statuses. However this will be improved in further patch where I also implement read status FIFO register in the tasklet. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rt2800: move usb specific txdone/txstatus routines to rt2800libStanislaw Gruszka
In order to reuse usb txdone/txstatus routines for mmio, move them to common rt2800lib.c file. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rtlwifi: btcoex: Use proper enumerated types for Wi-Fi only interfaceNathan Chancellor
Clang warns when one enumerated type is implicitly converted to another. drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c:1327:34: warning: implicit conversion from enumeration type 'enum btc_chip_interface' to different enumeration type 'enum wifionly_chip_interface' [-Wenum-conversion] wifionly_cfg->chip_interface = BTC_INTF_PCI; ~ ^~~~~~~~~~~~ drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c:1330:34: warning: implicit conversion from enumeration type 'enum btc_chip_interface' to different enumeration type 'enum wifionly_chip_interface' [-Wenum-conversion] wifionly_cfg->chip_interface = BTC_INTF_USB; ~ ^~~~~~~~~~~~ drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c:1333:34: warning: implicit conversion from enumeration type 'enum btc_chip_interface' to different enumeration type 'enum wifionly_chip_interface' [-Wenum-conversion] wifionly_cfg->chip_interface = BTC_INTF_UNKNOWN; ~ ^~~~~~~~~~~~~~~~ 3 warnings generated. Use the values from the correct enumerated type, wifionly_chip_interface. BTC_INTF_UNKNOWN = WIFIONLY_INTF_UNKNOWN = 0 BTC_INTF_PCI = WIFIONLY_INTF_PCI = 1 BTC_INTF_USB = WIFIONLY_INTF_USB = 2 Link: https://github.com/ClangBuiltLinux/linux/issues/135 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01ath5k: Remove unused BUG_ONNathan Chancellor
Clang warns that the address of a pointer will always evaluated as true in a boolean context: drivers/net/wireless/ath/ath5k/debug.c:1031:14: warning: address of array 'ah->sbands' will always evaluate to 'true' [-Wpointer-bool-conversion] BUG_ON(!ah->sbands); ~~~~~^~~~~~ ./include/asm-generic/bug.h:61:45: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^~~~~~~~~ ./include/linux/compiler.h:77:42: note: expanded from macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ 1 warning generated. Given that this condition is always false because of the logical not, just remove it. Link: https://github.com/ClangBuiltLinux/linux/issues/130 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01rsi: Remove unnecessary boolean conditionNathan Chancellor
Clang warns that the address of a pointer will always evaluated as true in a boolean context. drivers/net/wireless/rsi/rsi_91x_mac80211.c:927:50: warning: address of array 'key->key' will always evaluate to 'true' [-Wpointer-bool-conversion] if (vif->type == NL80211_IFTYPE_STATION && key->key && ~~ ~~~~~^~~ 1 warning generated. Link: https://github.com/ClangBuiltLinux/linux/issues/136 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01ipw2x00: Remove unnecessary parenthesesNathan Chancellor
Clang warns when multiple pairs of parentheses are used for a single conditional statement. drivers/net/wireless/intel/ipw2x00/ipw2200.c:5655:28: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((priv->ieee->iw_mode == IW_MODE_ADHOC)) { ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ drivers/net/wireless/intel/ipw2x00/ipw2200.c:5655:28: note: remove extraneous parentheses around the comparison to silence this warning if ((priv->ieee->iw_mode == IW_MODE_ADHOC)) { ~ ^ ~ drivers/net/wireless/intel/ipw2x00/ipw2200.c:5655:28: note: use '=' to turn this equality comparison into an assignment if ((priv->ieee->iw_mode == IW_MODE_ADHOC)) { ^~ = 1 warning generated. Link: https://github.com/ClangBuiltLinux/linux/issues/134 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-01Merge branch 'bpf-per-cpu-cgroup-storage'Daniel Borkmann
Roman Gushchin says: ==================== This patchset implements per-cpu cgroup local storage and provides an example how per-cpu and shared cgroup local storage can be used for efficient accounting of network traffic. v4->v3: 1) incorporated Alexei's feedback v3->v2: 1) incorporated Song's feedback 2) rebased on top of current bpf-next v2->v1: 1) added a selftest implementing network counters 2) added a missing free() in cgroup local storage selftest ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01selftests/bpf: cgroup local storage-based network countersRoman Gushchin
This commit adds a bpf kselftest, which demonstrates how percpu and shared cgroup local storage can be used for efficient lookup-free network accounting. Cgroup local storage provides generic memory area with a very efficient lookup free access. To avoid expensive atomic operations for each packet, per-cpu cgroup local storage is used. Each packet is initially charged to a per-cpu counter, and only if the counter reaches certain value (32 in this case), the charge is moved into the global atomic counter. This allows to amortize atomic operations, keeping reasonable accuracy. The test also implements a naive network traffic throttling, mostly to demonstrate the possibility of bpf cgroup--based network bandwidth control. Expected output: ./test_netcnt test_netcnt:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01samples/bpf: extend test_cgrp2_attach2 test to use per-cpu cgroup storageRoman Gushchin
This commit extends the test_cgrp2_attach2 test to cover per-cpu cgroup storage. Bpf program will use shared and per-cpu cgroup storages simultaneously, so a better coverage of corresponding core code will be achieved. Expected output: $ ./test_cgrp2_attach2 Attached DROP prog. This ping in cgroup /foo should fail... ping: sendmsg: Operation not permitted Attached DROP prog. This ping in cgroup /foo/bar should fail... ping: sendmsg: Operation not permitted Attached PASS prog. This ping in cgroup /foo/bar should pass... Detached PASS from /foo/bar while DROP is attached to /foo. This ping in cgroup /foo/bar should fail... ping: sendmsg: Operation not permitted Attached PASS from /foo/bar and detached DROP from /foo. This ping in cgroup /foo/bar should pass... ### override:PASS ### multi:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01selftests/bpf: extend the storage test to test per-cpu cgroup storageRoman Gushchin
This test extends the cgroup storage test to use per-cpu flavor of the cgroup storage as well. The test initializes a per-cpu cgroup storage to some non-zero initial value (1000), and then simple bumps a per-cpu counter each time the shared counter is atomically incremented. Then it reads all per-cpu areas from the userspace side, and checks that the sum of values adds to the expected sum. Expected output: $ ./test_cgroup_storage test_cgroup_storage:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01selftests/bpf: add verifier per-cpu cgroup storage testsRoman Gushchin
This commits adds verifier tests covering per-cpu cgroup storage functionality. There are 6 new tests, which are exactly the same as for shared cgroup storage, but do use per-cpu cgroup storage map. Expected output: $ ./test_verifier #0/u add+sub+mul OK #0/p add+sub+mul OK ... #286/p invalid cgroup storage access 6 OK #287/p valid per-cpu cgroup storage access OK #288/p invalid per-cpu cgroup storage access 1 OK #289/p invalid per-cpu cgroup storage access 2 OK #290/p invalid per-cpu cgroup storage access 3 OK #291/p invalid per-cpu cgroup storage access 4 OK #292/p invalid per-cpu cgroup storage access 5 OK #293/p invalid per-cpu cgroup storage access 6 OK #294/p multiple registers share map_lookup_elem result OK ... #662/p mov64 src == dst OK #663/p mov64 src != dst OK Summary: 914 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01bpftool: add support for PERCPU_CGROUP_STORAGE mapsRoman Gushchin
This commit adds support for BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01bpf: sync include/uapi/linux/bpf.h to tools/include/uapi/linux/bpf.hRoman Gushchin
The sync is required due to the appearance of a new map type: BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, which implements per-cpu cgroup local storage. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01bpf: don't allow create maps of per-cpu cgroup local storagesRoman Gushchin
Explicitly forbid creating map of per-cpu cgroup local storages. This behavior matches the behavior of shared cgroup storages. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01bpf: introduce per-cpu cgroup local storageRoman Gushchin
This commit introduced per-cpu cgroup local storage. Per-cpu cgroup local storage is very similar to simple cgroup storage (let's call it shared), except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations. >From userspace's point of view, accessing a per-cpu cgroup storage is similar to other per-cpu map types (e.g. per-cpu hashmaps and arrays). Writing to a per-cpu cgroup storage is not atomic, but is performed by copying longs, so some minimal atomicity is here, exactly as with other per-cpu maps. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>