summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-08-01tipc: compat: allow tipc commands without argumentsTaras Kondratiuk
Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit") broke older tipc tools that use compat interface (e.g. tipc-config from tipcutils package): % tipc-config -p operation not supported The commit started to reject TIPC netlink compat messages that do not have attributes. It is too restrictive because some of such messages are valid (they don't need any arguments): % grep 'tx none' include/uapi/linux/tipc_config.h #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */ #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */ #define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */ #define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */ #define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */ #define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */ #define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */ #define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */ This patch relaxes the original fix and rejects messages without arguments only if such arguments are expected by a command (reg_type is non zero). Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit") Cc: stable@vger.kernel.org Signed-off-by: Taras Kondratiuk <takondra@cisco.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01fm10k: reduce scope of the ring variableJacob Keller
Reduce the scope of the ring local variable in the fm10k_assign_l2_accel function. This was detected by cppcheck and resolves the following warning produced by that tool: [fm10k_netdev.c:1447]: (style) The scope of the variable 'ring' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of the result local variableJacob Keller
Reduce the scope of the result local variable in the fm10k_iov_msg_lport_state_pf function. This was detected by cppcheck and resolves the following warning produced by that tool: [fm10k_pf.c:1435]: (style) The scope of the variable 'result' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of the local msg variableJacob Keller
The msg variable in the fm10k_mbx_validate_msg_size and fm10k_sm_mbx_transmit functions is only used within the do {} loop scope. Reduce its scope only to where it is used. This was detected by cppcheck, and resolves the following warnings produced by that tool: [fm10k_mbx.c:299]: (style) The scope of the variable 'msg' can be reduced. [fm10k_mbx.c:2004]: (style) The scope of the variable 'msg' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of the local i variableJacob Keller
Reduce the scope of the local loop variable in the fm10k_check_hang_subtask function. This was detected by cppcheck and resolves the following warning produced by that tool: [driver/fm10k_pci.c:852]: (style) The scope of the variable 'i' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of the err variableJacob Keller
Reduce the scope of the local variable err in the fm10k_detach_subtask function. This was detected by cppcheck and resolves the following warning produced by that tool: [fm10k_pci.c:403]: (style) The scope of the variable 'err' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01ACPI: PM: Fix regression in acpi_device_set_power()Rafael J. Wysocki
Commit f850a48a0799 ("ACPI: PM: Allow transitions to D0 to occur in special cases") overlooked the fact that acpi_power_transition() may change the power.state value for the target device and if that happens, it may confuse acpi_device_set_power() and cause it to omit the _PS0 evaluation which on some systems is necessary to change power states of devices from low-power to D0. Fix that by saving the current value of power.state for the target device before passing it to acpi_power_transition() and using the saved value in a subsequent check. Fixes: f850a48a0799 ("ACPI: PM: Allow transitions to D0 to occur in special cases") Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reported-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-01fm10k: reduce the scope of the tx_buffer variableJacob Keller
The tx_buffer local variable in the function fm10k_clean_tx_ring is not used except inside a smaller block scope. Reduce the scope to its point of use. This was detected by cppcheck and resolves the following style warning produced by that tool: [fm10k_netdev.c:179]: (style) The scope of the variable 'tx_buffer' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of the q_idx local variableJacob Keller
Reduce the scope of the q_idx local variable in the fm10k_cache_ring_qos function. This was detected by cppcheck and resolves the following style warning produced by that tool: [fm10k_main.c:2016]: (style) The scope of the variable 'q_idx' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of local err variableJacob Keller
Reduce the scope of the local err variable in the fm10k_iov_alloc_data function. This was detected by cppcheck and resolves the following style warning produced by that tool: [fm10k_iov.c:426]: (style) The scope of the variable 'err' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce the scope of qv local variableJacob Keller
Reduce the scope of the qv vector pointer local variable in the fm10k_set_coalesce function. This was detected by cppcheck and resolves the following style warning produced by that tool: [fm10k_ethtool.c:658]: (style) The scope of the variable 'qv' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce scope of *p local variableJacob Keller
Reduce the scope of the char *p local variable to only the block where it is used. This was detected by cppcheck and resolves the following style warning produced by that tool: [fm10k_ethtool.c:229]: (style) The scope of the variable 'p' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01fm10k: reduce scope of the err variableJacob Keller
Reduce the scope of the err local variable in the fm10k_dcbnl_ieee_setets function. This was detected using cppcheck, and resolves the following style warning: [fm10k_dcbnl.c:37]: (style) The scope of the variable 'err' can be reduced. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-01Merge branch 'setsockopt-extra-mem'Alexei Starovoitov
Stanislav Fomichev says: ==================== Current setsockopt hook is limited to the size of the buffer that user had supplied. Since we always allocate memory and copy the value into kernel space, allocate just a little bit more in case BPF program needs to override input data with a larger value. The canonical example is TCP_CONGESTION socket option where input buffer is a string and if user calls it with a short string, BPF program has no way of extending it. The tests are extended with TCP_CONGESTION use case. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-01selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use caseStanislav Fomichev
Ignore SOL_TCP:TCP_CONGESTION in getsockopt and always override SOL_TCP:TCP_CONGESTION with "cubic" in setsockopt hook. Call setsockopt(SOL_TCP, TCP_CONGESTION) with short optval ("nv") to make sure BPF program has enough buffer space to replace it with "cubic". Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-01bpf: always allocate at least 16 bytes for setsockopt hookStanislav Fomichev
Since we always allocate memory, allocate just a little bit more for the BPF program in case it need to override user input with bigger value. The canonical example is TCP_CONGESTION where input string might be too small to override (nv -> bbr or cubic). 16 bytes are chosen to match the size of TCP_CA_NAME_MAX and can be extended in the future if needed. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-01Merge branch 'net-dsa-mv88e6xxx-avoid-some-redundant-VTU-operations'David S. Miller
Vivien Didelot says: ==================== net: dsa: mv88e6xxx: avoid some redundant VTU operations The mv88e6xxx driver currently uses a mv88e6xxx_vtu_get wrapper to get a single entry and uses a boolean to eventually initialize a fresh one. However the fresh entry is only needed in one place and mv88e6xxx_vtu_getnext is simple enough to call it directly. Doing so makes the code easier to read, especially for the return code expected by switchdev to honor software VLANs. In addition to not loading the VTU again when an entry is already correctly programmed, this also allows to avoid programming the broadcast entries again when updating a port's membership, from e.g. tagged to untagged. This patch series removes the mv88e6xxx_vtu_get wrapper in favor of direct calls to mv88e6xxx_vtu_getnext, and also renames the _mv88e6xxx_port_vlan_add and _mv88e6xxx_port_vlan_del helpers using an old underscore prefix convention. In case the port's membership is already correctly programmed in hardware, the following debug message may be printed: [ 745.989884] mv88e6085 2188000.ethernet-1:00: p4: already a member of VLAN 42 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_addVivien Didelot
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read and _mv88e6xxx_port_vlan_add is the only function requiring the preparation of a new VLAN entry. To simplify things up, remove the mv88e6xxx_vtu_get wrapper and explicit the VLAN lookup in _mv88e6xxx_port_vlan_add. This rework also avoids programming the broadcast entries again when changing a port's membership, e.g. from tagged to untagged. At the same time, rename the helper using an old underscore convention. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_delVivien Didelot
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read. Explicit the call to mv88e6xxx_vtu_getnext in _mv88e6xxx_port_vlan_del and the return value expected by switchdev in case of software VLANs. At the same time, rename the helper using an old underscore convention. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01net: dsa: mv88e6xxx: call vtu_getnext directly in db load/purgeVivien Didelot
mv88e6xxx_vtu_getnext is simple enough to call it directly in the mv88e6xxx_port_db_load_purge function and explicit the return code expected by switchdev for software VLANs when an hardware VLAN does not exist. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01net: dsa: mv88e6xxx: explicit entry passed to vtu_getnextVivien Didelot
mv88e6xxx_vtu_getnext interprets two members from the input mv88e6xxx_vtu_entry structure: the (excluded) vid member to start the iteration from, and the valid argument specifying whether the VID must be written or not (only required once at the start of a loop). Explicit the assignation of these two fields right before calling mv88e6xxx_vtu_getnext, as it is done in the mv88e6xxx_vtu_get wrapper. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01net: dsa: mv88e6xxx: lock mutex in vlan_prepareVivien Didelot
Lock the mutex in the mv88e6xxx_port_vlan_prepare function called by the DSA stack, instead of doing it in the internal mv88e6xxx_port_check_hw_vlan helper. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01i2c: s3c2410: Mark expected switch fall-throughGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/i2c/busses/i2c-s3c2410.c: In function 'i2c_s3c_irq_nextbyte': drivers/i2c/busses/i2c-s3c2410.c:431:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (i2c->state == STATE_READ) ^ drivers/i2c/busses/i2c-s3c2410.c:439:2: note: here case STATE_WRITE: ^~~~ Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-01i2c: at91: fix clk_offset for sama5d2Michał Mirosław
In SAMA5D2 datasheet, TWIHS_CWGR register rescription mentions clock offset of 3 cycles (compared to 4 in eg. SAMA5D3). Cc: stable@vger.kernel.org # 5.2.x [needs applying to i2c-at91.c instead for earlier kernels] Fixes: 0ef6f3213dac ("i2c: at91: add support for new alternative command mode") Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-01i2c: at91: disable TXRDY interrupt after sending dataMichał Mirosław
Driver was not disabling TXRDY interrupt after last TX byte. This caused interrupt storm until transfer timeouts for slow or broken device on the bus. The patch fixes the interrupt storm on my SAMA5D2-based board. Cc: stable@vger.kernel.org # 5.2.x [v5.2 introduced file split; the patch should apply to i2c-at91.c before the split] Fixes: fac368a04048 ("i2c: at91: add new driver") Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com> Tested-by: Raag Jadav <raagjadav@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-01block: Fix __blkdev_direct_IO() for bio fragmentsDamien Le Moal
The recent fix to properly handle IOCB_NOWAIT for async O_DIRECT IO (patch 6a43074e2f46) introduced two problems with BIO fragment handling for direct IOs: 1) The dio size processed is calculated by incrementing the ret variable by the size of the bio fragment issued for the dio. However, this size is obtained directly from bio->bi_iter.bi_size AFTER the bio submission which may result in referencing the bi_size value after the bio completed, resulting in an incorrect value use. 2) The ret variable is not incremented by the size of the last bio fragment issued for the bio, leading to an invalid IO size being returned to the user. Fix both problem by using dio->size (which is incremented before the bio submission) to update the value of ret after bio submissions, including for the last bio fragment issued. Fixes: 6a43074e2f46 ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO") Reported-by: Masato Suzuki <masato.suzuki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-01net/mlx5e: Allow dropping specific tunnel packetsTonghao Zhang
In some case, we don't want to allow specific tunnel packets to host that can avoid to take up high CPU (e.g network attacks). But other tunnel packets which not matched in hardware will be sent to host too. $ tc filter add dev vxlan_sys_4789 \ protocol ip chain 0 parent ffff: prio 1 handle 1 \ flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \ enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \ action tunnel_key unset pipe action drop Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: TX reporter cleanupAya Levin
Remove redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: Set tx reporter only on successful creationAya Levin
When failing to create tx reporter, don't set the reporter's pointer. Creating a reporter is not mandatory for driver load, avoid garbage/error pointer. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: Fix mlx5e_tx_reporter_create return valueAya Levin
Return error when failing to create a reporter in devlink. Since NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer can't be NULL and can only hold an error in bad path. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: Rx, checksum handling refactoringSaeed Mahameed
Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is supposed to fixup SKB checksum if needed. And rename mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup(). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: Tx, Soften inline mode VLAN dependenciesTariq Toukan
If capable, use zero inline mode in TX WQE for non-VLAN packets. For VLAN ones, keep the enforcement of at least L2 inline mode, unless the WQE VLAN insertion offload cap is on. Performance: Tested single core packet rate of 64Bytes. NIC: ConnectX-5 CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz pktgen: Before: 12.46 Mpps After: 14.65 Mpps (+17.5%) XDP_TX: The MPWQE flow is not affected, as it already has this optimization. So we test with priv-flag xdp_tx_mpwqe: off. Before: 9.90 Mpps After: 10.20 Mpps (+3%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Tested-by: Noam Stolero <noams@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: XDP, Slight enhancement for WQE fetch functionTariq Toukan
Instead of passing an output param, let function return the WQE pointer. In addition, pass &pi so it gets its value in the function, and save the redundant assignment that comes after it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet leftShay Agroskin
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller than a certain size (set to 256 bytes) would be sent inline within its WQE TX descriptor (mem-copied), in case the hardware tx queue is congested beyond a pre-defined water-mark. If a MPWQE cannot contain an additional inline packet, we close this MPWQE session, and send the packet inlined within the next MPWQE. To save some MPWQE session close+open operations, we don't open MPWQE sessions that are contiguously smaller than certain size (set to the HW MPWQE maximum size). If there isn't enough contiguous room in the send queue, we fill it with NOPs and wrap the send queue index around. This way, qualified packets are always sent inline. Perf tests: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_TX: With 24 channels: | ------ | bounced packets | inlined packets | inline ratio | | before | 113.6Mpps | 96.3Mpps | 84% | | after | 115Mpps | 99.5Mpps | 86% | With one channel: | ------ | bounced packets | inlined packets | inline ratio | | before | 6.7Mpps | 0pps | 0% | | after | 6.8Mpps | 0pps | 0% | As we can see, there is improvement in both inline ratio and overall packet rate for 24 channels. Also, we see no degradation for the one-channel case. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: Tx, Strict the room needed for SQ edge NOPsTariq Toukan
We use NOPs to populate the WQ fragment edge if the WQE does not fit in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ). The upper bound on the needed number of NOPs is one WQEBB less than the largest possible WQE, for otherwise the WQE would certainly fit. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: Add flow counter poolGavi Teitz
Add a pool of flow counters, based on flow counter bulks, removing the need to allocate a new counter via a costly FW command during the flow creation process. The time it takes to acquire/release a flow counter is cut from ~50 [us] to ~50 [ns]. The pool is part of the mlx5 driver instance, and provides flow counters for aging flows. mlx5_fc_create() was modified to provide counters for aging flows from the pool by default, and mlx5_destroy_fc() was modified to release counters back to the pool for later reuse. If bulk allocation is not supported or fails, and for non-aging flows, the fallback behavior is to allocate and free individual counters. The pool is comprised of three lists of flow counter bulks, one of fully used bulks, one of partially used bulks, and one of unused bulks. Counters are provided from the partially used bulks first, to help limit bulk fragmentation. The pool maintains a threshold, and strives to maintain the amount of available counters below it. The pool is increased in size when a counter acquisition request is made and there are no available counters, and it is decreased in size when the last counter in a bulk is released and there are more available counters than the threshold. All pool size changes are done in the context of the acquiring/releasing process. The value of the threshold is directly correlated to the amount of used counters the pool is providing, while constrained by a hard maximum, and is recalculated every time a bulk is allocated/freed. This ensures that the pool only consumes large amounts of memory for available counters if the pool is being used heavily. When fully populated and at the hard maximum, the buffer of available counters consumes ~40 [MB]. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: Add flow counter bulk infrastructureGavi Teitz
Add infrastructure to track bulks of flow counters, providing the means to allocate and deallocate bulks, and to acquire and release individual counters from the bulks. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-Switch, add ingress rate supportEli Cohen
Use the scheduling elements to implement ingress rate limiter on an eswitch ports ingress traffic. Since the ingress of eswitch port is the egress of VF port, we control eswitch ingress by controlling VF egress. Configuration is done using the ports' representor net devices. Please note that burst size configuration is not supported by devices ConnectX-5 and earlier generations. Configuration examples: tc: tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k ovs: ovs-vsctl set interface eth0 ingress_policing_rate=1000 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch. 1) Eli improves the handling of the support for QoS element type 2) Gavi refactors and prepares mlx5 flow counters for bulk allocation support 3) Parav, refactors and improves E-Switch load/unload flows 4) Saeed, two misc cleanups Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01Merge tag 'irqchip-fixes-5.3' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: A small bunch of fixes from the irqchip department: - Fix a couple of UAF on error paths (RZA1, GICv3 ITS) - Fix iMX GPCv2 trigger setting - Add missing of_node_put on error path in MBIGEN - Add another bunch of /* fall-through */ to silence warnings
2019-08-01net/mlx5: E-switch, Tide up eswitch config sequenceParav Pandit
Currently for PF and ECPF vports, representors are created before their eswitch hardware ports are initialized in below flow. mlx5_eswitch_enable() esw_offloads_init() esw_offloads_load_all_reps() [..] esw_enable_vport() However for VFs, vports are initialized before creating their respective netdev represnetors in event handling context. Similarly while disabling eswitch, first hardware vports are disabled, followed by destroying their representors. Here while underlying vports gets destroyed but its respective user facing netdevice can still exist on which user can continue to perform more offload operations. Instead, its more accurate to do enable_eswitch switchdev mode: 1. perform FDB tables initialization 2. initialize hw vport 3. create and publish representor for this vport disable_eswitch switchdev mode: 1. destroy user facing representor for the vport 2. disable hw vport 3. perform FDB tables cleanup Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-Switch, Remove redundant mc_promisc NULL checkParav Pandit
mc_promisc pointer points to an instance of struct esw_mc_addr allocated as part of the esw structure. Hence it cannot be NULL. Removed such redundant check and assign where it is actually used. While at it, add comment around legacy mode fields and move mc_promisc close to other legacy mode structures to improve code redability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-Switch, remove redundant error handlingSaeed Mahameed
We don't need to handle error flow of esw_create_legacy_table() in the same branch, it is already being handled directly after the if statement, for both legacy and switchdev modes in one place. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-switch, Introduce helper function to enable/disable vportsParav Pandit
vports needs to be enabled in switchdev and legacy mode. In switchdev mode, vports should be enabled after initializing the FDB tables and before creating their represntors so that representor works on an initialized vport object. Prepare a helper function which can be called when enabling either of the eswitch modes. Similarly, have disable vports helper function. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vportsParav Pandit
First enable TSAR Qos hardware block in device before enabling its user vports. This refactor is needed so that vports can be enabled before their representor netdevice can be created. While at it, esw_create_tsar() returns error code which was used only to print error. However esw_create_tsar() already prints warning if it hits an error. Hence, remove the redundant warning. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-switch, Combine metadata enable/disable functionalityParav Pandit
Except bit toggling code, rest of the code is same to enable/disable metadata passing functionality. Hence, combine them to single function and control using enable flag. Also instead of checking metadata supported at multiple places, fold into the helper function. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: E-Switch, Verify support QoS element typeEli Cohen
Check if firmware supports the requested element type before attempting to create the element type. In addition, explicitly specify the request element type and tsar type. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: Make load_one() and unload_one() symmetricParav Pandit
Currently mlx5_load_one() perform device registration using mlx5_register_device(). But mlx5_unload_one() doesn't unregister. Make them symmetric by doing device unregistration in mlx5_unload_one(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: Fix offset of tisc bits reserved fieldSaeed Mahameed
First reserved field is off by one instead of reserved_at_1 it should be reserved_at_2, fix that. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5: Add flow counter bulk allocation hardware bits and commandGavi Teitz
Add a handle to invoke the new FW capability of allocating a bulk of flow counters. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>