summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/hfi1/ipoib_tx.c
AgeCommit message (Collapse)Author
2022-07-17IB/hfi1: switch to netif_napi_add_tx()Jakub Kicinski
Switch to the new API not requiring the NAPI_POLL_WEIGHT argument. Link: https://lore.kernel.org/r/20220705230208.924408-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-01-28IB/hfi1: Fix alloc failure with larger txqueuelenMike Marciniszyn
The following allocation with large txqueuelen will result in the following warning: Call Trace: __alloc_pages_nodemask+0x283/0x2c0 kmalloc_large_node+0x3c/0xa0 __kmalloc_node+0x22a/0x2f0 hfi1_ipoib_txreq_init+0x19f/0x330 [hfi1] hfi1_ipoib_setup_rn+0xd3/0x1a0 [hfi1] rdma_init_netdev+0x5a/0x80 [ib_core] ipoib_intf_init+0x6c/0x350 [ib_ipoib] ipoib_intf_alloc+0x5c/0xc0 [ib_ipoib] ipoib_add_one+0xbe/0x300 [ib_ipoib] add_client_context+0x12c/0x1a0 [ib_core] ib_register_client+0x147/0x190 [ib_core] ipoib_init_module+0xdd/0x132 [ib_ipoib] do_one_initcall+0x46/0x1c3 do_init_module+0x5a/0x220 load_module+0x14c5/0x17f0 __do_sys_init_module+0x13b/0x180 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca For ipoib, the txqueuelen is modified with the module parameter send_queue_size. Fix by changing to use kv versions of the same allocator to handle the large allocations. The allocation embeds a hdr struct that is dma mapped. Change that struct to a pointer to a kzalloced struct. Cc: stable@vger.kernel.org Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/1642287756-182313-3-git-send-email-mike.marciniszyn@cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28IB/hfi1: Fix panic with larger ipoib send_queue_sizeMike Marciniszyn
When the ipoib send_queue_size is increased from the default the following panic happens: RIP: 0010:hfi1_ipoib_drain_tx_ring+0x45/0xf0 [hfi1] Code: 31 e4 eb 0f 8b 85 c8 02 00 00 41 83 c4 01 44 39 e0 76 60 8b 8d cc 02 00 00 44 89 e3 be 01 00 00 00 d3 e3 48 03 9d c0 02 00 00 <c7> 83 18 01 00 00 00 00 00 00 48 8b bb 30 01 00 00 e8 25 af a7 e0 RSP: 0018:ffffc9000798f4a0 EFLAGS: 00010286 RAX: 0000000000008000 RBX: ffffc9000aa0f000 RCX: 000000000000000f RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff88810ff08000 R08: ffff88889476d900 R09: 0000000000000101 R10: 0000000000000000 R11: ffffc90006590ff8 R12: 0000000000000200 R13: ffffc9000798fba8 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fd0f79cc3c0(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000aa0f118 CR3: 0000000889c84001 CR4: 00000000001706e0 Call Trace: <TASK> hfi1_ipoib_napi_tx_disable+0x45/0x60 [hfi1] hfi1_ipoib_dev_stop+0x18/0x80 [hfi1] ipoib_ib_dev_stop+0x1d/0x40 [ib_ipoib] ipoib_stop+0x48/0xc0 [ib_ipoib] __dev_close_many+0x9e/0x110 __dev_change_flags+0xd9/0x210 dev_change_flags+0x21/0x60 do_setlink+0x31c/0x10f0 ? __nla_validate_parse+0x12d/0x1a0 ? __nla_parse+0x21/0x30 ? inet6_validate_link_af+0x5e/0xf0 ? cpumask_next+0x1f/0x20 ? __snmp6_fill_stats64.isra.53+0xbb/0x140 ? __nla_validate_parse+0x47/0x1a0 __rtnl_newlink+0x530/0x910 ? pskb_expand_head+0x73/0x300 ? __kmalloc_node_track_caller+0x109/0x280 ? __nla_put+0xc/0x20 ? cpumask_next_and+0x20/0x30 ? update_sd_lb_stats.constprop.144+0xd3/0x820 ? _raw_spin_unlock_irqrestore+0x25/0x37 ? __wake_up_common_lock+0x87/0xc0 ? kmem_cache_alloc_trace+0x3d/0x3d0 rtnl_newlink+0x43/0x60 The issue happens when the shift that should have been a function of the txq item size mistakenly used the ring size. Fix by using the item size. Cc: stable@vger.kernel.org Fixes: d47dfc2b00e6 ("IB/hfi1: Remove cache and embed txreq in ring") Link: https://lore.kernel.org/r/1642287756-182313-2-git-send-email-mike.marciniszyn@cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04Merge tag 'v5.15-rc4' into rdma.get for-nextJason Gunthorpe
Merged due to dependencies in following patches. Conflict in drivers/infiniband/hw/hfi1/ipoib_tx.c resolved by hand to take the %p change and txq stats rename together. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27IB/hfi1: Add ring consumer and producers tracesMike Marciniszyn
These traces are used to debugging ring issues. The ipoib_txreq needed to be moved to a header file to allow access from the trace header file. The trace changes include: - new producer/consumer traces - new allocation deallocation traces - additional fidelity for SDMA engine prints Fixes: 4bd00b55c978 ("IB/hfi1: Add AIP tx traces") Link: https://lore.kernel.org/r/20210913132852.131370.9664.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27IB/hfi1: Remove atomic completion countMike Marciniszyn
The atomic is not needed. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20210913132847.131370.54250.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27IB/hfi1: Tune netdev xmit cachelinesMike Marciniszyn
This patch moves fields in the ring and creates a line for the producer and the consumer. The adds a consumer side variable that tracks the ring avail so that the code doesn't have the read the other cacheline to get a count for every packet. A read now only occurs when the avail is at 0. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20210913132842.131370.15636.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27IB/hfi1: Get rid of tx priv backpointerMike Marciniszyn
The txq has the backpointer, so this is a micro optimization for the tx path. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20210913132836.131370.89704.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27IB/hfi1: Get rid of hot path divideMike Marciniszyn
The pointer math in this statemet does a divide; struct hfi1_ipoib_txq *txq = &priv->txqs[napi - priv->tx_napis]; Elminate the divide by embedding the struct napi_strut in the txq and getting the txq with a container_of() using the newly embedded napi. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20210913132831.131370.3993.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27IB/hfi1: Remove cache and embed txreq in ringMike Marciniszyn
This patch removes kmem cache allocation and deallocation in favor of having the ipoib_txreq in the ring. The consumer is now the packet sending side allocating tx descriptors from ring and the producer is the napi interrupt handling freeing tx descriptors. The locks are now eliminated because the napi tx lock insures a single consumer and the napi handling insures a single producer. The napi poll is converted to memory poll looking for items that have been marked completed. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20210913132826.131370.4397.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27RDMA/hfi1: Fix kernel pointer leakGuo Zhi
Pointers should be printed with %p or %px rather than cast to 'unsigned long long' and printed with %llx. Change %llx to %p to print the secured pointer. Fixes: 042a00f93aad ("IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev") Link: https://lore.kernel.org/r/20210922134857.619602-1-qtxuning1999@sjtu.edu.cn Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn> Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30IB/hfi1: Indicate DMA wait when txq is queued for wakeupMike Marciniszyn
There is no counter for dmawait in AIP, which hampers debugging performance issues. Add the counter increment when the txq is queued. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20210715160440.142451.8278.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Remove indirect call to hfi1_ipoib_send_dma()Mike Marciniszyn
hfi1_ipoib_send() directly calls hfi1_ipoib_send_dma() with no value add. Fix by renaming hfi1_ipoib_send_dma() to hfi1_ipoib_send(). Link: https://lore.kernel.org/r/1617026056-50483-6-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Use napi_schedule_irqoff() for tx napiMike Marciniszyn
The call is from an ISR context and napi_schedule_irqoff() can be used. Change the call to the more efficient type. Link: https://lore.kernel.org/r/1617026056-50483-5-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Correct oversized ring allocationMike Marciniszyn
The completion ring for tx is using the wrong size to size the ring, oversizing the ring by two orders of magniture. Correct the allocation size and use kcalloc_node() to allocate the ring. Fix mistaken GFP defines in similar allocations. Link: https://lore.kernel.org/r/1617026056-50483-4-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdevMike Marciniszyn
The current rdma_netdev handling in ipoib hooks the tx_timeout handler, but prints out a totally useless message that prevents effective debugging especially when multiple transmit queues are being used. Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1 to print additional information. The existing non-helpful message is avoided when the driver has presented a callback. Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Add AIP tx tracesMike Marciniszyn
Add traces to allow for debugging issues with AIP tx. Link: https://lore.kernel.org/r/1617026056-50483-2-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12IB/hfi1: switch to core handling of rx/tx byte/packet countersHeiner Kallweit
Use netdev->tstats instead of a member of hfi1_ipoib_dev_priv for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-06-24IB/hfi1: Add atomic triggered sleep/wakeupMike Marciniszyn
When running iperf in a two host configuration the following trace can occur: [ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out The issue happens because the current implementation relies on the netif txq being stopped to control the flushing of the tx list. There are two resources that the transmit logic can wait on and stop the txq: - SDMA descriptors - Ring space to hold completions The ring space is tested on the sending side and relieved when the ring is consumed in the napi tx reaping. Unfortunately, that reaping can run conncurrently with the workqueue flushing of the txlist. If the txq is started just before the workitem executes, the txlist will never be flushed, leading to the txq being stuck. Fix by: - Adding sleep/wakeup wrappers * Use an atomic to control the call to the netif routines inside the wrappers - Use another atomic to record ring space exhaustion * Only wakeup when the a ring space exhaustion has happened and it relieved Add additional wrappers to clarify the ring space resource handling. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Correct -EBUSY handling in tx codeMike Marciniszyn
The current code mishandles -EBUSY in two ways: - The flow change doesn't test the return from the flush and runs on to process the current packet racing with the wakeup processing - The -EBUSY handling for a single packet inserts the tx into the txlist after the submit call, racing with the same wakeup processing Fix the first by dropping the skb and returning NETDEV_TX_OK. Fix the second by insuring the the list entry within the txreq is inited when allocated. This enables the sleep routine to detect that the txreq has used the non-list api and queue the packet to the txlist. Both flaws can lead to having the flushing thread executing in causing two threads to manipulate the txlist. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-15RDMA/hfi1: Fix trivial mis-spelling of 'descriptor'Kieran Bingham
The word 'descriptor' is misspelled throughout the tree. Fix it up accordingly: decriptors -> descriptors Link: https://lore.kernel.org/r/20200609124610.3445662-3-kieran.bingham+renesas@ideasonboard.com Link: https://lore.kernel.org/r/20200609124610.3445662-12-kieran.bingham+renesas@ideasonboard.com Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21IB/hfi1: Add functions to transmit datagram ipoib packetsGary Leshner
This patch implements the mechanism to accelerate the transmit side of a multiple transmit queue RDMA netdev by submitting the packets to the SDMA engine directly instead of sending through the verbs layer. This patch also changes the UD/SEND_ONLY op to output the entropy value in byte 0 of deth[1]. UD/SEND_ONLY_WITH_IMMEDIATE uses the previous behavior with no entropy value being output. The code in the ipoib rdma netdev which submits tx requests upon successful submission will call trace_sdma_output_ibhdr to output the ibhdr to the trace buffer. Link: https://lore.kernel.org/r/20200511160548.173205.45616.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>