Age | Commit message (Collapse) | Author |
|
Sean Wang says:
====================
mediatek: Fix crash caused by reporting inconsistent skb->len to BQL
Changes since v1:
- fix inconsistent enumeration which easily causes the potential bug
The series fixes kernel BUG caused by inconsistent SKB length reported
into BQL. The reason for inconsistent length comes from hardware BUG which
results in different port number carried on the TXD within the lifecycle of
SKB. So patch 2) is proposed for use a software way to track which port
the SKB involving instead of hardware way. And patch 1) is given for another
issue I found which causes TXD and SKB inconsistency that is not expected
in the initial logic, so it is also being corrected it in the series.
The log for the kernel BUG caused by the issue is posted as below.
[ 120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26!
[ 120.837684] Internal error: Oops - BUG: 0 [#1] SMP ARM
[ 120.842778] Modules linked in:
[ 120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-191576-gdbcef47 #35
[ 120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 120.859012] task: c1007480 task.stack: c1000000
[ 120.863510] PC is at dql_completed+0x108/0x17c
[ 120.867915] LR is at 0x46
[ 120.870512] pc : [<c03c19c8>] lr : [<00000046>] psr: 80000113
[ 120.870512] sp : c1001d58 ip : c1001d80 fp : c1001d7c
[ 120.881895] r10: 0000003e r9 : df6b3400 r8 : 0ed86506
[ 120.887075] r7 : 00000001 r6 : 00000001 r5 : 0ed8654c r4 : df0135d8
[ 120.893546] r3 : 00000001 r2 : df016800 r1 : 0000fece r0 : df6b3480
[ 120.900018] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 120.907093] Control: 10c5387d Table: 9e27806a DAC: 00000051
[ 120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218)
[ 120.918744] Stack: (0xc1001d58 to 0xc1002000)
....
121.085331] 1fc0: 00000000 c0a52a28 00000000 c10855d4 c1003c58 c0a52a24 c100885c 8000406a
[ 121.093444] 1fe0: 410fc073 00000000 00000000 c1001ff8 8000807c c0a009cc 00000000 00000000
[ 121.101575] [<c03c19c8>] (dql_completed) from [<c04cb010>] (mtk_napi_tx+0x1d0/0x37c)
[ 121.109263] [<c04cb010>] (mtk_napi_tx) from [<c05e28cc>] (net_rx_action+0x24c/0x3b8)
[ 121.116951] [<c05e28cc>] (net_rx_action) from [<c010152c>] (__do_softirq+0xe4/0x35c)
[ 121.124638] [<c010152c>] (__do_softirq) from [<c012a624>] (irq_exit+0xe8/0x150)
[ 121.131895] [<c012a624>] (irq_exit) from [<c017750c>] (__handle_domain_irq+0x70/0xc4)
[ 121.139666] [<c017750c>] (__handle_domain_irq) from [<c0101404>] (gic_handle_irq+0x58/0x9c)
[ 121.147953] [<c0101404>] (gic_handle_irq) from [<c010e18c>] (__irq_svc+0x6c/0x90)
[ 121.155373] Exception stack(0xc1001ef8 to 0xc1001f40)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix port inconsistency on TXD due to hardware BUG that would cause
different port number is carried on the same TXD between tx_map()
and tx_unmap() with the iperf test. It would cause confusing BQL
logic which leads to kernel panic when dual GMAC runs concurrently.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix inconsistency between the TXD descriptor and the used buffer that
would cause unexpected logic at mtk_tx_unmap() during skb housekeeping.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now the command:
ethtool --phy-statistics eth0
will cause system crash with meassage "Unable to handle kernel NULL pointer
dereference at virtual address 00000010" from:
(kszphy_get_stats) from [<c069f1d8>] (ethtool_get_phy_stats+0xd8/0x210)
(ethtool_get_phy_stats) from [<c06a0738>] (dev_ethtool+0x5b8/0x228c)
(dev_ethtool) from [<c06b5484>] (dev_ioctl+0x3fc/0x964)
(dev_ioctl) from [<c0679f7c>] (sock_ioctl+0x170/0x2c0)
(sock_ioctl) from [<c02419d4>] (do_vfs_ioctl+0xa8/0x95c)
(do_vfs_ioctl) from [<c02422c4>] (SyS_ioctl+0x3c/0x64)
(SyS_ioctl) from [<c0107d60>] (ret_fast_syscall+0x0/0x44)
The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
defined. As result, struct phy_device *phydev->priv pointer will not be
initializes (null).
This issue will affect also following phys:
KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
Fix it by:
- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
settings.
- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
statistic counters.
Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct kcm_clone only contains fd, and kcm_clone() only
writes this struct, so there is no need to copy it from user.
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr.
Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The section is not specific only to "TC classifiers", but applies to the
whole TC subsystem. Also, add couple of forgotten headers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify the loop in phy_supported_speeds().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phylib has some undesirable behaviour when forcing a link mode through
ethtool. phylib uses this code:
idx = phy_find_valid(phy_find_setting(phydev->speed, phydev->duplex),
features);
to find an index in the settings table. phy_find_setting() starts at
index 0, and scans upwards looking for an exact speed and duplex match.
When it doesn't find it, it returns MAX_NUM_SETTINGS - 1, which is
10baseT-Half duplex.
phy_find_valid() then scans from the point (and effectively only checks
one entry) before bailing out, returning MAX_NUM_SETTINGS - 1.
phy_sanitize_settings() then sets ->speed to SPEED_10 and ->duplex to
DUPLEX_HALF whether or not 10baseT-Half is supported or not. This goes
against all the comments against these functions, and 10baseT-Half may
not even be supported by the hardware.
Rework these functions, introducing a new method of scanning the table.
There are two modes of lookup that phylib wants: exact, and inexact.
- in exact mode, we return either an exact match or failure
- in inexact mode, we return an exact match if it exists, a match at
the highest speed that is not greater than the requested speed
(ignoring duplex), or failing that, the lowest supported speed, or
failure.
The biggest difference is that we always check whether the entry is
supported before further consideration, so all unsupported entries are
not considered as candidates.
This results in arguably saner behaviour, better matches the comments,
and is probably what users would expect.
This becomes important as ethernet speeds increase, PHYs exist which do
not support the 10Mbit speeds, and half-duplex is likely to become
obsolete - it's already not even an option on 10Gbit and faster links.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since 3.12 it has been possible to configure the default queuing
discipline via sysctl. This patch adds ability to configure the
default queue discipline in kernel configuration. This is useful for
environments where configuring the value from userspace is difficult
to manage.
The default is still the same as before (pfifo_fast) and it is
possible to change after kernel init with sysctl. This is similar
to how TCP congestion control works.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Manish Chopra says:
====================
qed/qede: aRFS support
This series adds support for Accelerated Flow Steering
in qede driver for TCP/UDP over IPv4/IPv6 protocols.
Please consider applying this series to "net-next"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for aRFS for TCP and UDP
protocols with IPv4/IPv6.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds necessary APIs to interface with
qede aRFS support in successive patch.
It also reserves separate PTT entry for aRFS,
[as being in fastpath flow] for hardware access instead of
trying to acquire it at run time from the ptt pool.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This chip is used by a lot of embedded devices and also by the Raspberry
Pi 1, 2 & 3 which were created to promote the study of computer
sciences. Students wanting to learn kernel / network device driver
programming through those devices can only rely on the Linux kernel
driver source to make their own.
This commit adds a lot of comments to the registers definition to expand
the register names.
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: Martin Wetterwald <martin@wetterwald.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Steve Glendinning <steve.glendinning@shawell.net>
Acked-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the PCI_SUBSYS_DEVID_81XX_RGX and use the same to set
the max bgx per node count.
This fixes the issue intoduced by following commit
78aacb6f6 net: thunderx: Fix invalid mac addresses for node1 interfaces
With this commit the max_bgx_per_node for 81xx is set as 2 instead of 3
because of which num_vfs is always calculated as zero.
Signed-off-by: George Cherian <george.cherian@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MTU overhead calculation in L2TP device set-up
merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
needs to be adjusted to lock the tunnel socket while
referencing the sub-data structures to derive the
socket's IP overhead.
Reported-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzkaller reported a use-after-free in ip_recv_error at line
info->ipi_ifindex = skb->dev->ifindex;
This function is called on dequeue from the error queue, at which
point the device pointer may no longer be valid.
Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
pointer is valid or NULL. Store it in temporary storage skb->cb.
It is safe to reference skb->dev here, as called from device drivers
or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
in that case it is NULL and ifindex is set to 0 (invalid).
Do not return a pktinfo cmsg if ifindex is 0. This maintains the
current behavior of not returning a cmsg if skb->dev was NULL.
On dequeue, the ipv4 path will cast from sock_exterr_skb to
in_pktinfo. Both have ifindex as their first element, so no explicit
conversion is needed. This is by design, introduced in commit
0b922b7a829c ("net: original ingress device index in PKTINFO"). For
ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.
Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Unlike normal compat syscall variants, it is needed only for
biarch architectures that have different alignement requirements for
u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle
a store of 64bit value at 32bit-aligned address. We used to have one
such (ia64), but its biarch support has been gone since 2010 (after
being broken in 2008, which went unnoticed since nobody had been using
it).
It had escaped removal at the same time only because back in 2004
a patch that switched several syscalls on amd64 from private wrappers to
generic compat ones had switched to use of compat_sys_getdents64(), which
hadn't needed (or used) a compat wrapper on amd64.
Let's bury it - it's at least 7 years overdue.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Similar to commit 87e9f0315952
("ipv4: fix a potential deadlock in mcast getsockopt() path"),
there is a deadlock scenario for IP_ROUTER_ALERT too:
CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(sk_lock-AF_INET);
lock(rtnl_mutex);
lock(sk_lock-AF_INET);
Fix this by always locking RTNL first on all setsockopt() paths.
Note, after this patch ip_ra_lock is no longer needed either.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is
set to 1, gratuitous arp requests are sent when the device is brought up.
The same is expected when ndisc_notify is set to 1 (per ndisc_notify in
Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP
event; add it.
Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).
That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.
Make intel_pstate set transition_delay_us to 500.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
For ease of management it would be nice for users to specify that the
device node for a nbd device is destroyed once it is disconnected and
there are no more users. Add a client flag and enable this operation to
happen.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In order to support deleting the device on disconnect we need to
refcount the actual nbd_device struct. So add the refcounting framework
and change how we free the normal devices at rmmod time so we can catch
reference leaks.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Allow users to query the status of existing nbd devices. Right now this
only returns whether or not the device is connected, but could be
extended in the future to include more information.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Sometimes we like to upgrade our server without making all of our
clients freak out and reconnect. This patch provides a way to specify a
dead connection timeout to allow us to pause all requests and wait for
new connections to be opened. With this in place I can take down the
nbd server for less than the dead connection timeout time and bring it
back up and everything resumes gracefully.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When running a disconnect torture test I noticed that sometimes we would
crash with a negative ref count on our queue. This was because we were
ending the same request twice. Turns out we were racing with
NBD_CLEAR_SOCK clearing the requests as well as the teardown of the
device clearing the requests. So instead make the ioctl only shutdown
the sockets and make it so that we only ever run nbd_clear_que from the
device teardown.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Provide a mechanism to notify userspace that there's been a link problem
on a NBD device. This will allow userspace to re-establish a connection
and provide the new socket to the device without disrupting the device.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We want to be able to reconnect dead connections to existing block
devices, so add a reconfigure netlink command. We will also allow users
to change their timeout on the fly, but everything else will require a
disconnect and reconnect. You won't be able to add more connections
either, simply replace dead connections with new more lively
connections.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The existing ioctl interface for configuring NBD devices is a bit
cumbersome and hard to extend. The other problem is we leave a
userspace app sitting in it's syscall until the device disconnects,
which is less than ideal.
This patch introduces a netlink interface for adding and disconnecting
nbd devices. This has the benefits of being easily extendable without
breaking older userspace applications, and allows us to configure a nbd
device without leaving a userspace app sitting waiting for the device to
disconnect.
With this interface we also gain the ability to configure more devices
than are preallocated at insmod time. We also have gained the ability
to not specify a particular device and be provided one for us so that
userspace doesn't need to find a free device to configure.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In preparation for the upcoming netlink interface we need to not rely on
already having the bdev for the NBD device we are doing operations on.
Instead of passing the bdev around, just use it in places where we know
we already have the bdev.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In order to properly refcount the various aspects of a NBD device we
need to separate out the configuration elements of the nbd device. The
configuration of a NBD device has a different lifetime from the actual
device, so it doesn't make sense to bundle these two concepts. Add a
config_refs to keep track of the configuration structure, that way we
can be sure that we never access it when we've torn down the device.
Add a new nbd_config structure to hold all of the transient
configuration information. Finally create this when we open the device
so that it is in place when we start to configure the device. This has
a nice side-effect of fixing a long standing problem where you could end
up with a half-configured nbd device that needed to be "disconnected" in
order to be usable again. Now once we close our device the
configuration will be discarded.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently if we have multiple connections and one of them goes down we will tear
down the whole device. However there's no reason we need to do this as we
could have other connections that are working fine. Deal with this by keeping
track of the state of the different connections, and if we lose one we mark it
as dead and send all IO destined for that socket to one of the other healthy
sockets. Any outstanding requests that were on the dead socket will timeout and
be re-submitted properly.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When adding a new socket we look it up and then try to add it to our
configuration. If any of those steps fail we need to make sure we put
the socket so we don't leak them.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Saeed Mahameed says:
====================
Mellanox, mlx5 RDMA net device support
This series provides the lower level mlx5 support of RDMA netdevice
creation API [1] suggested and introduced by Intel's HFI OPA VNIC
netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation.
mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current
IPoIB ULP generic netdevice, providing:
- mlx5 RSS support.
- mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..).
- Full mlx5 HW features transparent to the ULP itself.
The idea here is to reuse and benefit from the already implemented mlx5e netdevice
management and channels API for both etherent and RDMA netdevices, since both IPoIB
and Ethernet netdevices share same common mlx5 HW resources (with some small
exceptions) and share most of the control/data path logic, it is more natural to
have them share the same code.
The differences between IPoIB and Ethernet netdevices can be summarized to:
Steering:
In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet
the traffic is handled by vports and vport steering is managed by e-switch or FW.
For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS
HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to
them (underlay QP will be 0 in case of Ethernet).
RX,TX:
Since IPoIB traffic is different, slightly modified RX and TX handlers are required,
still we do some code reuse in data path via common helper functions.
All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet
and IPoIB netdevices, e.g.
- Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..)
- Offloads, checksum, GRO, LRO, TSO, and more.
- netdevice logic and non Ethernet specific ndos (open/close, etc..)
In order to achieve what we want:
In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored
the mlx5 steering code to accept the underlay QP as a parameter for creating steering
objects and enabled flow steering for IB link.
Then we are going to use the mlx5e netdevice profile, which is already used to separate between
NIC and VF representors netdevices, to create new type of IPoIB netdevice profile.
For that, one small refactoring is required to make mlx5e netdevice profile management
more genetic and agnostic to link type which is done in patch #4.
In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a
skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches,
using mlx5e already existing APIs.
Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS
resources, same as mlx5e but without vlan and L2 steering tables.
Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources
same as mlx5e but with one TC (tc = 0) support.
Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels.
Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts.
Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the
mlx5i IPoIB xmit routine.
Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series
in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make
the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers.
Patch #15, Small cleanup to avoid e-switch with IPoIB netdev.
In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3]
- which was alread submitted to the rdma mailing list - and this series is required
plus an extra small patch [4] which will connect between both sides and actually enables the offload.
Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable
the feature.
Thanks,
Saeed.
[1] https://patchwork.kernel.org/patch/9676637/
[2] https://lwn.net/Articles/715453/
https://patchwork.kernel.org/patch/9587815/
[3] https://patchwork.kernel.org/patch/9672069/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the driver support only ethernet eswitch, and we want to
protect downstream IPoIB netdev from trying to access it in IB link.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement IPoIB RX SKB handler.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to have different RX handler per profile, fix and refactor the
current code to take the rx handler directly from the netdevice profile
rather than computing it on runtime as it was done with the switchdev
mode representor rx handler.
This will also remove the current wrong assumption in mlx5e_alloc_rq
code that mlx5e_priv->ppriv is of the type vport_rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Break current mlx5e xmit flow into smaller blocks (helper functions)
in order to reuse them for IPoIB SKB transmission.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|