Age | Commit message (Collapse) | Author |
|
This IOCTL provides a mechanism for userspace to trigger a sync object
directly. There are other ways that userspace can trigger a syncobj
such as submitting a dummy batch somewhere or hanging on to a triggered
sync_file and doing an import. This just provides an easy way to
manually trigger the sync object without weird hacks.
The motivation for this IOCTL is Vulkan fences. Vulkan lets you create
a fence already in the signaled state so that you can wait on it
immediatly without stalling. We could also handle this with a new
create flag to ask the driver to create a syncobj that is already
signaled but the IOCTL seemed a bit cleaner and more generic.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This just resets the dma_fence to NULL so it looks like it's never been
signaled. This will be useful once we add the new wait API for allowing
wait on "submit and signal" behavior.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Michael Chan says:
====================
bnxt_en: Updates.
Various changes including updated firmware interface, improved TX ring
allocation scheme, improved out-of-memory logic in NAPI loop, reduced
default rings on multi-port devices, new PCI IDs. Of particular note,
CPU affinity hints from Vasundhara Volam.
TC Flower eswitch support from Sathya Perla.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds code to implement TC_CLSFLOWER_STATS TC-cmd and the
required FW code to query the stats from the HW.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the hwrm_cfa_flow_alloc/free() routines
that are needed to issue the FW cmds needed for TC flower offload.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for offloading TC based flow
rules and actions for the 'flower' classifier in the bnxt_en driver.
It includes logic to parse flow rules and actions received from the
TC subsystem, store them and issue the corresponding
hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop,
redir, vlan push/pop actions are supported in this patch.
In this patch the hwrm_cfa_flow_xxx routines are just stubs.
The code for these routines is introduced in the next patch for easier
review. Also, the code to query the TC/flower action stats will
be introduced in a subsequent patch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The routine bnxt_link_bp_to_dl() is used to set the devlink ptr
in bnxt struct (bp) and also to set the bnxt back ptr in
the devlink struct. If devlink_register() fails, bp->dl must
be cleared which is not happening currently. This patch fixes
bnxt_link_bp_to_dl() to clear bp->dl by passing a NULL dl ptr.
Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget. This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers. Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
initialize board_info values with proper enums for defensive programming
purposes. This will avoid any errors of the enums being declared not
lining up with the board_info array.
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC. If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.
The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings. We fix it as follows:
1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available. There is a flag in
the firmware message to do that. If not available, abort and the
current TX rings will not be disabled.
2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved. If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Flow APIs are added in this firmware interface.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Samuel Mendoza-Jonas says:
====================
NCSI VLAN Filtering Support
This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
A fair amount of code already exists in the NCSI stack for VLAN filtering but
none of it is actually hooked up. This goes the final mile and fixes a few
bugs in the existing code found along the way (patch 1).
Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
enable filtering as it's a large consumer of NCSI (and what I've been
testing on).
v3: - Add comment describing change to ncsi_find_filter()
- Catch NULL in clear_one_vid() from ncsi_get_filter()
- Simplify state changes when kicking updated channel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
This allows the VLAN core to notify the NCSI driver when changes occur
so that the remote NCSI channel can be properly configured to filter on
the set VLAN tags.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-08-27
This series contains updates to i40e and i40evf only.
Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.
Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status. Adds VF support for controlling VLAN tag
stripping via ethtool.
Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS. Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.
Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.
Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function. So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 3510ca20ece0 ("Minor page waitqueue cleanups") made the page
queue code always add new waiters to the back of the queue, which helps
upcoming patches to batch the wakeups for some horrid loads where the
wait queues grow to thousands of entries.
However, I forgot about the nasrt add_page_wait_queue() special case
code that is only used by the cachefiles code. That one still continued
to add the new wait queue entries at the beginning of the list.
Fix it, because any sane batched wakeup will require that we don't
suddenly start getting new entries at the beginning of the list that we
already handled in a previous batch.
[ The current code always does the whole list while holding the lock, so
wait queue ordering doesn't matter for correctness, but even then it's
better to add later entries at the end from a fairness standpoint ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove the search for index of constant buffer size
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the hw MTU limitation by setting max_mtu
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Greg Kroah-Hartman says:
====================
irda: move it to drivers/staging so we can delete it
The IRDA code has long been obsolete and broken. So, to keep people
from trying to use it, and to prevent people from having to maintain it,
let's move it to drivers/staging/ so that we can delete it entirely from
the kernel in a few releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The irda code will be deleted in a future kernel release, so no need to
have anyone do any new work on it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
And finally, move the irda include files into
drivers/staging/irda/include/net/irda. Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the irda drivers from drivers/net/irda/ to
drivers/staging/irda/drivers as they will be deleted in a future kernel
release.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's time to get rid of IRDA. It's long been broken, and no one seems
to use it anymore. So move it to staging and after a while, we can
delete it from there.
To start, move the network irda core from net/irda to
drivers/staging/irda/net/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to use acpi_match_platform_list() for the platform check.
There is no change in functionality.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info. acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.
Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_platform_list(), so that other
drivers do not have to implement their own version.
There is no change in functionality.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Madalin Bucur says:
====================
Add RSS to DPAA 1.x Ethernet driver
This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.
v2: removed a C++ style comment
v3: move struct fman to header file to avoid exporting a function
v4: addressed compilation issues introduced in v3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set the skb hash when then FMan Keygen hash result is available.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.
Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
current switchdev drivers dont seem to support offloading fdb
entries pointing to the bridge device which have fdb->dst
not set to any port. This patch adds a NULL fdb->dst check in
the switchdev notifier code.
This patch fixes the below NULL ptr dereference:
$bridge fdb add 00:02:00:00:00:33 dev br0 self
[ 69.953374] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 69.954044] IP: br_switchdev_fdb_notify+0x29/0x80
[ 69.954044] PGD 66527067
[ 69.954044] P4D 66527067
[ 69.954044] PUD 7899c067
[ 69.954044] PMD 0
[ 69.954044]
[ 69.954044] Oops: 0000 [#1] SMP
[ 69.954044] Modules linked in:
[ 69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1
[ 69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
04/01/2014
[ 69.954044] task: ffff88007b827140 task.stack: ffffc90001564000
[ 69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80
[ 69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246
[ 69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX:
00000000000000c0
[ 69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI:
ffff8800795d0600
[ 69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09:
0000000000000000
[ 69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12:
ffff8800795e0880
[ 69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15:
000000000000001c
[ 69.954044] FS: 00007f93d3085700(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[ 69.954044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4:
00000000000006e0
[ 69.954044] Call Trace:
[ 69.954044] fdb_notify+0x3f/0xf0
[ 69.954044] __br_fdb_add.isra.12+0x1a7/0x370
[ 69.954044] br_fdb_add+0x178/0x280
[ 69.954044] rtnl_fdb_add+0x10a/0x200
[ 69.954044] rtnetlink_rcv_msg+0x1b4/0x240
[ 69.954044] ? skb_free_head+0x21/0x40
[ 69.954044] ? rtnl_calcit.isra.18+0xf0/0xf0
[ 69.954044] netlink_rcv_skb+0xed/0x120
[ 69.954044] rtnetlink_rcv+0x15/0x20
[ 69.954044] netlink_unicast+0x180/0x200
[ 69.954044] netlink_sendmsg+0x291/0x370
[ 69.954044] ___sys_sendmsg+0x180/0x2e0
[ 69.954044] ? filemap_map_pages+0x2db/0x370
[ 69.954044] ? do_wp_page+0x11d/0x420
[ 69.954044] ? __handle_mm_fault+0x794/0xd80
[ 69.954044] ? vma_link+0xcb/0xd0
[ 69.954044] __sys_sendmsg+0x4c/0x90
[ 69.954044] SyS_sendmsg+0x12/0x20
[ 69.954044] do_syscall_64+0x63/0xe0
[ 69.954044] entry_SYSCALL64_slow_path+0x25/0x25
[ 69.954044] RIP: 0033:0x7f93d2bad690
[ 69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX:
00007f93d2bad690
[ 69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI:
0000000000000003
[ 69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09:
000000000000000a
[ 69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12:
00007ffc7217a670
[ 69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15:
00007ffc72182aa0
[ 69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47
20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d
55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00
[ 69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP:
ffffc90001567918
[ 69.954044] CR2: 0000000000000008
[ 69.954044] ---[ end trace 03e9eec4a82c238b ]---
Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node. The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct. However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.
This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.
This bug has been around forever but doesn't look like it has caused any
noticeable symptoms. However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.
Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Without this fix, ports configured on top of ixgbe miss link up
notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even
though the port is up and working.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The current process is to first calculate the CRC and then copy the client
data into the packet. This leaves a window in which the packet contents and
CRC can get out of sync, if the client changes the data after the CRC is
calculated but before the data is copied.
By copying the data into the packet and then calculating the CRC directly
from the packet contents we eliminate the window.
This can be seen with qperf's ud_bi_bw test. This seems like very
strange/reckless client behavior, but whether the client has mangled its
data or not RXE should be able to transfer it reliably.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This fixes another path in rxe_requester() that might overlook stale SKBs,
preventing cleanup.
Fixes: 1217197142d1 ("rxe: fix broken receive queue draining")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset().
sk_dst_get() takes a new reference on dst, so the dst_release() doesn't
actually release the original reference, which was the design intent.
Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Otherwise the reference count goes negative as IPv6 packets complete.
Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
To successfully match an IPv6 path, the path cookie must match. Store it
in the QP so that the IPv6 path can be reused.
Replace open-coded version of dst_check() with the actual call, fixing the
logic. The open-coded version skips the check call if dst->obsolete is 0
(DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE
means that the route may continue to be used, though.
Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The resource array is sized by max_dest_rd_atomic, not max_rd_atomic.
Iterating over max_rd_atomic entries of qp->resp.resources[] will cause
incorrect behavior when the two attributes are different (or even
crash if max_rd_atomic is larger).
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This prevents the stack from accessing userspace objects while they
are being torn down.
One possible sequence of events:
- Userspace program exits
- ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
ib_destroy_cq(), etc. and releasing/freeing the UCQ
- The QP still has tasklets running, so it isn't destroyed yet
- The CQ is referenced by the QP, so the CQ isn't destroyed yet
- The UCQ is kfree()'d anyway
- A send work request completes
- rxe_send_complete() calls cq->ibcq.comp_handler()
- ib_uverbs_comp_handler() runs and crashes; the event queue is checked
for is_closed, but it has no way to check the ib_ucq_object before
accessing it
The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add
the QP ref before output to avoid extra dereferences during network
congestion. This could lead to unwanted destruction of the QP.
Fix up the skb_out accounting, too.
Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|