Age | Commit message (Collapse) | Author |
|
When filling the RSS table, we have to make sure that the rx queue is
attached to an online CPU.
This patch is not a full support for cpu_hotplug, but rather a way to
make sure that we don't break network on system booted with the maxcpus
parameter.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds an extra indirection when setting the indirection table
into the RSS hardware table to improve the packets distribution across
CPUs. For example, if 2 queues are used on a multi-core system this new
indirection will choose two queues on two different CPUs instead of the
two first queues which are on the same first CPU.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the RSS indirection table support, allowing to use the
ethtool -x and -X options to dump and set this table.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
[Maxime: Small warning fixes, use one table per port]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the
RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table
to be used is chosen according to the default rx queue that would be
used for the packet.
This default rx queue is set in the Lookup_id Table (also called
Decoding Table), and is equal to the port->first_rxq.
Since the Classifier itself isn't active at any time for the moment,
this doesn't have a direct effect, the default rx queue at the moment is
the one where all packets end-up into.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no RSS_TABLE register in PPv2 Controller. The register 0x1510
which was specified is actually named "RSS_HASH_SEL", but isn't used by
this driver at all.
Based on how this register was used, it should have been the
RXQ2RSS_TABLE register, which allows to select the RSS table that will
be used for the incoming packet.
The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE
register.
Since RSS tables are actually not used by the driver for now, this
commit does not fix a runtime bug.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cosmetic patch fixing a typo in one of the RSS comments.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The number of receive queue per port is :
- MVPP2_DEFAULT_RXQ if in single queue mode
- MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode
with MVPP2_DEFAULT_RXQ = 4.
However, we don't use the extra rx queues at the moment, we really only
need one per port per CPU, until some more advanced classification rules
are implemented.
Suggested-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a dedicated #define that indicates the number of rx queues per
port per cpu, this commit removes a harcoded use of that value
This doesn't fix any runtime bugs since the harcoded value matches the
expected value.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since RSS only applies when we have per-cpu rx queues, it should only
be enabled when the driver is configured to make use of multi-queue
mode.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The multi queue mode is needed to have RSS available, and offers some
nice advantages, being able to have one rx queue vector per CPU.
This mode has been usable through the use of a module parameter, this
commit makes it the default value.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PPv2 driver defines 2 "queue_modes" :
- QDIST_SINGLE_MODE, where each port share one rx queue vector
between all CPUs
- QDIST_MULTI_MODE, where each port has one rx queue vector per CPU.
Multi queue mode isn't available on PPv2.1, make sure we fallback to
single mode when running on this revision.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The size of the the RSS indirection tables should be defined in mvpp2.h,
so that we can use it in all files of the PPv2 driver.
This commit moves the define in mvpp2.h, and adds the missing #include
in mvpp2_cls.h.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Include guards should be put before #includes. This doesn't fix any bug,
but prevent future compilation issues when adding new files in the mvpp2
driver
The Header Parser init function needs the platform_device definition,
and with the fixed include guards we need to add the missing include.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
getnstimeofday64 is deprecated in favor of the ktime_get() family of
functions. The direct replacement would be ktime_get_real_ts64(),
but I'm picking the basic ktime_get() instead:
- using a ktime_t simplifies the code compared to timespec64
- using monotonic time instead of real time avoids issues caused
by a concurrent settimeofday() or during a leap second adjustment.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The two do the same thing, but we want to have a consistent
naming in the kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We should take and release the filter_sem consistently during the
reset process, in the same manner as the mac_lock and reset_lock.
For lockdep consistency we also take the filter_sem for write around
other calls to efx->type->init().
Fixes: c2bebe37c6b6 ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some situations we may end up calling down_read while already
holding the semaphore for write, thus hanging. This has been seen
when setting the MAC address for the interface. The hung task log
in this situation includes this stack:
down_read
efx_ef10_filter_insert
efx_ef10_filter_insert_addr_list
efx_ef10_filter_vlan_sync_rx_mode
efx_ef10_filter_add_vlan
efx_ef10_filter_table_probe
efx_ef10_set_mac_address
efx_set_mac_address
dev_set_mac_address
In addition, lockdep rightly points out that nested calling of
down_read is incorrect.
Fixes: c2bebe37c6b6 ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock")
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SYSTEMPORT Lite reversed the logic compared to SYSTEMPORT, the
GIB_FCS_STRIP bit is set when the Ethernet FCS is stripped, and that bit
is not set by default. Fix the logic such that we properly check whether
that bit is set or not and we don't forward an extra 4 bytes to the
network stack.
Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ipsec->tx_tbl[] has IXGBE_IPSEC_MAX_SA_COUNT elements so the > needs
to be changed to >= so we don't read one element beyond the end of the
array.
Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This change makes it so that we are much more explicit about the ordering
of updates to the receive address register (RAR) table. Prior to this patch
I believe we may have been updating the table while entries were still
active, or possibly allowing for reordering of things since we weren't
explicitly flushing writes to either the lower or upper portion of the
register prior to accessing the other half.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The current position of .rss_flags field in struct rss_info causes
that fields .rsstable and .rssqueue (both 128 bytes long) crosses
cache-line boundaries. Moving it at the end properly align all fields.
Before patch:
struct rss_info {
u64 rss_flags; /* 0 8 */
u8 rsstable[128]; /* 8 128 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
u8 rss_queue[128]; /* 136 128 */
/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
u8 rss_hkey[40]; /* 264 40 */
};
After patch:
struct rss_info {
u8 rsstable[128]; /* 0 128 */
/* --- cacheline 2 boundary (128 bytes) --- */
u8 rss_queue[128]; /* 128 128 */
/* --- cacheline 4 boundary (256 bytes) --- */
u8 rss_hkey[40]; /* 256 40 */
u64 rss_flags; /* 296 8 */
};
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union
These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before patch:
struct be_tx_obj {
u32 db_offset; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct be_queue_info q; /* 8 56 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct be_queue_info cq; /* 64 56 */
struct be_tx_compl_info txcp; /* 120 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 2 boundary (128 bytes) --- */
struct sk_buff * sent_skb_list[2048]; /* 128 16384 */
...
}:
After patch:
struct be_tx_obj {
u32 db_offset; /* 0 4 */
struct be_tx_compl_info txcp; /* 4 4 */
struct be_queue_info q; /* 8 56 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct be_queue_info cq; /* 64 56 */
struct sk_buff * sent_skb_list[2048]; /* 120 16384 */
...
};
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The event queue description (be_eq_obj.desc) field is used only to format
string for IRQ name and it is not really needed to hold this value.
Remove it and use local variable to format string for IRQ name.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit fb6113e688e0 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.
Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.
Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial fix to spelling mistake in qed_probe message.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The late ts queue can contain a bunch of skbs while hi rate testing,
no need to check all of them if timestamp is already matched.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is.
So instead of checking upness of the device, check carrier state and
txability.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09
This patch series is meant to allow support for the L2 forward offload, aka
MACVLAN offload without the need for using ndo_select_queue.
The existing solution currently requires that we use ndo_select_queue in
the transmit path if we want to associate specific Tx queues with a given
MACVLAN interface. In order to get away from this we need to repurpose the
tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
a means of accessing the queues on the lower device. As a result we cannot
offload a device that is configured as multiqueue, however it doesn't
really make sense to configure a macvlan interfaced as being multiqueue
anyway since it doesn't really have a qdisc of its own in the first place.
The big changes in this set are:
Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
Disable XPS for single queue devices
Replace accel_priv with sb_dev in ndo_select_queue
Add sb_dev parameter to fallback function for ndo_select_queue
Consolidated ndo_select_queue functions that appeared to be duplicates
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose stats obtained from firmware via debugfs. These stats can't
be part of ethtool -S because the slow firmware mailbox can cause
packet drops under heavy load.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When running ethtool -S, some stats are requested from firmware.
Since getting these stats via firmware mailbox is slow, some packets
get dropped under heavy load while running ethtool -S.
So, remove these stats from ethtool -S.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Marvell PPv2 driver uses interrupts and tasklet but does not
explicitly include linux/interrupt.h, relying on implicit includes. This
one particularly is included by chance after a long unlogical chain of
inclusions. Fix this so we do not get future build breaks.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Size of csk_tbl is about 58K, which means 3rd order page allocation.
kvzalloc provides a fallback if no high order memory is available.
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
congestion argument passed to t4_sge_alloc_rxq() is used
to differentiate between nic/ofld queues.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.
In this case, the affected files are in debugfs (and should therefore only
be accessible to root) and check that *pos is zero (which prevents the
sys_splice() trick). Therefore, this is not a security fix, but rather a
small cleanup.
For the read handlers, fix it by using simple_read_from_buffer() instead of
custom logic.
For the write handler, add a check.
changed in v2:
- also fix dbg_write()
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Fix bug in the error code path when bnxt_request_irq() returns failure.
bnxt_disable_napi() should not be called in this error path because
NAPI has not been enabled yet.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or
freed by the RDMA driver is flawed. The max IRQ count is checked when
re-initializing the IRQ vectors and this can happen multiple times
during ifup or ethtool -L. If the max IRQ is reduced and the RDMA
driver is operational, we may not initailize IRQs correctly. This
problem shows up on VFs with very small number of MSIX.
There is no other logic that relies on the IRQ count excluding the ones
used by RDMA. So we fix it by just removing the call to subtract or
add the IRQs used by RDMA.
Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the driver assumes IFF_BROADCAST is always set and always sets
the broadcast filter. Modify the code to set or clear the broadcast
filter according to the IFF_BROADCAST flag.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current code returns -ENOMEM and does not bother to set the output
parameters to 0 when no rings are available. Some callers, such as
bnxt_get_channels() will display garbage ring numbers when that happens.
Fix it by always setting the output parameters.
Fixes: 6e6c5a57fbe1 ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If there aren't enough RX rings available, the driver will attempt to
use a single RX ring without the aggregation ring. If that also
fails, the BNXT_FLAG_AGG_RINGS flag is cleared but the other ring
parameters are not set consistently to reflect that. If more RX
rings become available at the next open, the RX rings will be in
an inconsistent state and may crash when freeing the RX rings.
Fix it by restoring the BNXT_FLAG_AGG_RINGS if not enough RX rings are
available to run without aggregation rings.
Fixes: bdbd1eb59c56 ("bnxt_en: Handle no aggregation ring gracefully.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is possible that OVS may set don’t care for DEI/CFI bit in
vlan_tci mask. Hence, checking for vlan_tci exact match will endup
in a vlan flow rejection.
This patch fixes the problem by checking for vlan_pcp and vid
separately, instead of checking for the entire vlan_tci.
Fixes: e85a9be93cf1 (bnxt_en: do not allow wildcard matches for L2 flows)
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These resources are needed for Spectrum-2 KVD linear management
implementation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prepare for Spectrum-2 FW version checking and
make mlxsw_sp_fw_rev_validate() per-ASIC as well as required FW revision
and FW filename.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For Spectrum-2, we need to insert priority to C-TCAM because HW
needs that info in order to correctly process scenarios where rules
are in both C-TCAM and A-TCAM.
So extend the mlxsw_sp_acl_ctcam_entry_add() args to accept indication
if priority needs to be filled up and implement the priority
computation and fill-up.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is going to be needed for Spectrum-2 C-TCAM implementation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since Spectrum-2 encodes blocks into different HW layout, push this
code into Spectrum-specific op.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the flex keys for Spectrum-2 differ not only in blocks definitions
but also in encoding layout, prepare for the implementation and pass
Spectrum/Spectrum-2 specific ops down to mlxsw_afk_create.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|