summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
AgeCommit message (Collapse)Author
2025-04-15ixgbe: wrap netdev_priv() usagePrzemek Kitszel
Wrap use of netdev_priv() in order to change the allocator of the device private structure from alloc_etherdev_mq() to the devlink in next commit. All but one netdev_priv() calls in the whole driver are replaced, the remaining one is called on MACVLAN (so not ixgbe) device. Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-05-20Merge tag 'dma-mapping-6.10-2024-05-20' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - optimize DMA sync calls when they are no-ops (Alexander Lobakin) - fix swiotlb padding for untrusted devices (Michael Kelley) - add documentation for swiotb (Michael Kelley) * tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping: dma: fix DMA sync for drivers not calling dma_set_mask*() xsk: use generic DMA sync shortcut instead of a custom one page_pool: check for DMA sync shortcut earlier page_pool: don't use driver-set flags field directly page_pool: make sure frag API fields don't span between cachelines iommu/dma: avoid expensive indirect calls for sync operations dma: avoid redundant calls for sync operations dma: compile-out DMA sync op calls when not used iommu/dma: fix zeroing of bounce buffer padding used by untrusted devices swiotlb: remove alloc_size argument to swiotlb_tbl_map_single() Documentation/core-api: add swiotlb documentation
2024-05-08xsk: use generic DMA sync shortcut instead of a custom oneAlexander Lobakin
XSk infra's been using its own DMA sync shortcut to try avoiding redundant function calls. Now that there is a generic one, remove the custom implementation and rely on the generic helpers. xsk_buff_dma_sync_for_cpu() doesn't need the second argument anymore, remove it. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-03-28net: remove gfp_mask from napi_alloc_skb()Jakub Kicinski
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practical choice the caller has is whether to set __GFP_NOWARN. But that's a false choice, too, allocation failures in atomic context will happen, and printing warnings in logs, effectively for a packet drop, is both too much and very likely non-actionable. This leads me to a conclusion that most uses of napi_alloc_skb() are simply misguided, and should use __GFP_NOWARN in the first place. We also have a "standard" way of reporting allocation failures via the queue stat API (qstats::rx-alloc-fail). The direct motivation for this patch is that one of the drivers used at Meta calls napi_alloc_skb() (so prior to this patch without __GFP_NOWARN), and the resulting OOM warning is the top networking warning in our fleet. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240327040213.3153864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06ixgbe: pull out stats update to common routinesMaciej Fijalkowski
Introduce ixgbe_update_{r,t}x_ring_stats() that will be used by both standard and ZC datapath. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-10-03net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().Sebastian Andrzej Siewior
xdp_do_flush_map() is deprecated and new code should use xdp_do_flush() instead. Replace xdp_do_flush_map() with xdp_do_flush(). Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Clark Wang <xiaoning.wang@nxp.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: David Arinzon <darinzon@amazon.com> Cc: Edward Cree <ecree.xilinx@gmail.com> Cc: Felix Fietkau <nbd@nbd.name> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Jassi Brar <jaswinder.singh@linaro.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: John Crispin <john@phrozen.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Cc: Louis Peens <louis.peens@corigine.com> Cc: Marcin Wojtas <mw@semihalf.com> Cc: Mark Lee <Mark-MC.Lee@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Noam Dagan <ndagan@amazon.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Shay Agroskin <shayagr@amazon.com> Cc: Shenwei Wang <shenwei.wang@nxp.com> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Wei Fang <wei.fang@nxp.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Arthur Kiyanovski <akiyano@amazon.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-21ixgbe, xsk: Get rid of redundant 'fallthrough'Maciej Fijalkowski
Intel drivers translate actions returned from XDP programs to their own return codes that have the following mapping: XDP_REDIRECT -> IXGBE_XDP_{REDIR,CONSUMED} XDP_TX -> IXGBE_XDP_{TX,CONSUMED} XDP_DROP -> IXGBE_XDP_CONSUMED XDP_ABORTED -> IXGBE_XDP_CONSUMED XDP_PASS -> IXGBE_XDP_PASS Commit c7dd09fd4628 ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") introduced new translation XDP_REDIRECT -> IXGBE_XDP_EXIT which is set when XSK RQ gets full and to indicate that driver should stop further Rx processing. This happens for unsuccessful xdp_do_redirect() so it is valuable to call trace_xdp_exception() for this case. In order to avoid IXGBE_XDP_EXIT -> IXGBE_XDP_CONSUMED overwrite, XDP_DROP case was moved above which in turn made the 'fallthrough' that is in XDP_ABORTED useless as it became the last label in the switch statement. Simply drop this leftover. Fixes: c7dd09fd4628 ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220421132126.471515-2-maciej.fijalkowski@intel.com
2022-04-15ixgbe, xsk: Diversify return values from xsk_wakeup call pathsMaciej Fijalkowski
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIOs in ixgbe's ndo_xsk_wakeup() implementation to EINVALs, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-11-maciej.fijalkowski@intel.com
2022-04-15ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets fullMaciej Fijalkowski
When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK Rx queue being full. In such case, terminate the Rx processing that is being done on the current HW Rx ring and let the user space consume descriptors from XSK Rx queue so that there is room that driver can use later on. Introduce new internal return code IXGBE_XDP_EXIT that will indicate case described above. Note that it does not affect Tx processing that is bound to the same NAPI context, nor the other Rx rings. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-8-maciej.fijalkowski@intel.com
2022-04-15ixgbe, xsk: Decorate IXGBE_XDP_REDIR with likely()Maciej Fijalkowski
ixgbe_run_xdp_zc() suggests to compiler that XDP_REDIRECT is the most probable action returned from BPF program that AF_XDP has in its pipeline. Let's also bring this suggestion up to the callsite of ixgbe_run_xdp_zc() so that compiler will be able to generate more optimized code which in turn will make branch predictor happy. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-5-maciej.fijalkowski@intel.com
2022-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/batman-adv/hard-interface.c commit 690bb6fb64f5 ("batman-adv: Request iflink once in batadv-on-batadv check") commit 6ee3c393eeb7 ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit 4d08b7b57ece ("net/smc: Fix cleanup when register ULP fails") commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()Maciej Fijalkowski
Commit c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") addressed the ring transient state when MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the interface to through down/up. Maurice reported that when carrier is not ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU cycles due to the constant NAPI rescheduling as ixgbe_poll() states that there is still some work to be done. To fix this, do not set work_done to false for a !netif_carrier_ok(). Fixes: c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") Reported-by: Maurice Baijens <maurice.baijens@ellips.com> Tested-by: Maurice Baijens <maurice.baijens@ellips.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-31ixgbe: respect metadata on XSK Rx to skbAlexander Lobakin
For now, if the XDP prog returns XDP_PASS on XSK, the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match ixgbe_construct_skb(). Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-31ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skbAlexander Lobakin
{__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, ixgbe_construct_skb_zc() currently allocates and reserves additional `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-31ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directlyAlexander Lobakin
To not dereference bi->xdp each time in ixgbe_construct_skb_zc(), pass bi->xdp as an argument instead of bi. We can also call xsk_buff_free() outside of the function as well as assign bi->xdp to NULL, which seems to make it closer to its name. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-13bpf: Let bpf_warn_invalid_xdp_action() report more infoPaolo Abeni
In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-09-30ixgbe: let the xdpdrv work with more than 64 cpusJason Xing
Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the server is equipped with more than 64 cpus online. So it turns out that the loading of xdpdrv causes the "NOMEM" failure. Actually, we can adjust the algorithm and then make it work through mapping the current cpu to some xdp ring with the protect of @tx_lock. Here are some numbers before/after applying this patch with xdp-example loaded on the eth0X: As client (tx path): Before After TCP_STREAM send-64 734.14 714.20 TCP_STREAM send-128 1401.91 1395.05 TCP_STREAM send-512 5311.67 5292.84 TCP_STREAM send-1k 9277.40 9356.22 (not stable) TCP_RR send-1 22559.75 21844.22 TCP_RR send-128 23169.54 22725.13 TCP_RR send-512 21670.91 21412.56 As server (rx path): Before After TCP_STREAM send-64 1416.49 1383.12 TCP_STREAM send-128 3141.49 3055.50 TCP_STREAM send-512 9488.73 9487.44 TCP_STREAM send-1k 9491.17 9356.22 (not stable) TCP_RR send-1 23617.74 23601.60 ... Notice: the TCP_RR mode is unstable as the official document explains. I tested many times with different parameters combined through netperf. Though the result is not that accurate, I cannot see much influence on this patch. The static key is places on the hot path, but it actually shouldn't cause a huge regression theoretically. Co-developed-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Jason Xing <xingwanli@kuaishou.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error pathWang Hai
In ixgbe_xsk_pool_enable(), if ixgbe_xsk_wakeup() fails, We should restore the previous state and clean up the resources. Add the missing clear af_xdp_zc_qps and unmap dma to fix this bug. Fixes: d49e286d354e ("ixgbe: add tracking of AF_XDP zero-copy state for each queue pair") Fixes: 4a9b32f30f80 ("ixgbe: fix potential RX buffer starvation for AF_XDP") Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20210817203736.3529939-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-24intel: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen
The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com
2021-06-03ixgbe: add correct exception tracing for XDPMagnus Karlsson
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-15ixgbe: optimize for XDP_REDIRECT in xsk pathMagnus Karlsson
Optimize ixgbe_run_xdp_zc() for the XDP program verdict being XDP_REDIRECT in the xsk zero-copy path. This path is only used when having AF_XDP zero-copy on and in that case most packets will be directed to user space. This provides a little under 100k extra packets in throughput on my server when running l2fwd in xdpsock. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better ↵Magnus Karlsson
performance Test for dma_need_sync earlier to increase performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as parameter and from that the xsk_buff_pool reference is dug out. Perf shows that this dereference causes a lot of cache misses. But as the buffer pool is now sent down to the driver at zero-copy initialization time, we might as well use this pointer directly, instead of going via the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu() instead of in xp_dma_sync_for_cpu. This gets rid of these cache misses. Throughput increases with 3% for the xdpsock l2fwd sample application on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-11-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfacesMagnus Karlsson
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umemMagnus Karlsson
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-07-01ethernet/intel: Convert fallthrough code commentsJeff Kirsher
Convert all the remaining 'fall through" code comments to the newer 'fallthrough;' keyword. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-06-01xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frameLorenzo Bianconi
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
2020-05-21ixgbe, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOLBjörn Töpel
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. v1->v2: Fixed xdp_buff data_end update. (Björn) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-11-bjorn.topel@gmail.com
2020-05-21xsk: Move driver interface to xdp_sock_drv.hMagnus Karlsson
Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support. v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub) Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com
2020-05-14xdp: For Intel AF_XDP drivers add XDP frame_szJesper Dangaard Brouer
Intel drivers implement native AF_XDP zerocopy in separate C-files, that have its own invocation of bpf_prog_run_xdp(). The setup of xdp_buff is also handled in separately from normal code path. This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice and ixgbe, as the code changes needed are very similar. Introduce a helper function xsk_umem_xdp_frame_sz() for calculating frame size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/158945347511.97035.8536753731329475655.stgit@firesoul
2019-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-12-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 127 non-merge commits during the last 17 day(s) which contain a total of 110 files changed, 6901 insertions(+), 2721 deletions(-). There are three merge conflicts. Conflicts and resolution looks as follows: 1) Merge conflict in net/bpf/test_run.c: There was a tree-wide cleanup c593642c8be0 ("treewide: Use sizeof_field() macro") which gets in the way with b590cb5f802d ("bpf: Switch to offsetofend in BPF_PROG_TEST_RUN"): <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) + sizeof_field(struct __sk_buff, priority), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c There are a few occasions that look similar to this. Always take the chunk with offsetofend(). Note that there is one where the fields differ in here: <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) + sizeof_field(struct __sk_buff, tstamp), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN"). 2) Merge conflict in arch/riscv/net/bpf_jit_comp.c: (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.) <<<<<<< HEAD if (is_13b_check(off, insn)) return -1; emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx); ======= emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx); >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Result should look like: emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx); 3) Merge conflict in arch/riscv/include/asm/pgtable.h: <<<<<<< HEAD ======= #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) #define vmemmap ((struct page *)VMEMMAP_START) >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Only take the BPF_* defines from there and move them higher up in the same file. Remove the rest from the chunk. The VMALLOC_* etc defines got moved via 01f52e16b868 ("riscv: define vmemmap before pfn_to_page calls"). Result: [...] #define __S101 PAGE_READ_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) [...] Let me know if there are any other issues. Anyway, the main changes are: 1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific to a provided BPF object file. This provides an alternative, simplified API compared to standard libbpf interaction. Also, add libbpf extern variable resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko. 2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also, add various BPF riscv JIT improvements, from Björn Töpel. 3) Extend bpftool to allow matching BPF programs and maps by name, from Paul Chaignon. 4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI flag for allowing updates without service interruption, from Andrey Ignatov. 5) Cleanup and simplification of ring access functions for AF_XDP with a bonus of 0-5% performance improvement, from Magnus Karlsson. 6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa. 7) Move and extend test_select_reuseport into BPF program tests under BPF selftests, from Jakub Sitnicki. 8) Various BPF sample improvements for xdpsock for customizing parameters to set up and benchmark AF_XDP, from Jay Jayatheerthan. 9) Improve libbpf to provide a ulimit hint on permission denied errors. Also change XDP sample programs to attach in driver mode by default, from Toke Høiland-Jørgensen. 10) Extend BPF test infrastructure to allow changing skb mark from tc BPF programs, from Nikita V. Shirokov. 11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King. 12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after libbpf conversion, from Jesper Dangaard Brouer. 13) Minor misc improvements from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20xsk: ixgbe: i40e: ice: mlx5: Xsk_umem_discard_addr to xsk_umem_release_addrMagnus Karlsson
Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to better reflect the new naming of the AF_XDP queue manipulation functions. As this functions is used by drivers implementing support for AF_XDP zero-copy, it requires a name change to these drivers. The function xsk_umem_release_addr_rq has also changed name in the same fashion. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-10-git-send-email-magnus.karlsson@intel.com
2019-12-19net/ixgbe: Fix concurrency issues between config flow and XSKMaxim Mikityanskiy
Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN. 2. After switching the XDP program, call synchronize_rcu to let ixgbe_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down. 4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to avoid using the XDP ring when it's already destroyed. synchronize_rcu is called from ixgbe_txrx_ring_disable. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-5-maximmi@mellanox.com
2019-11-08ixgbe: need_wakeup flag might not be set for TxMagnus Karlsson
The need_wakeup flag for Tx might not be set for AF_XDP sockets that are only used to send packets. This happens if there is at least one outstanding packet that has not been completed by the hardware and we get that corresponding completion (which will not generate an interrupt since interrupts are disabled in the napi poll loop) between the time we stopped processing the Tx completions and interrupts are enabled again. In this case, the need_wakeup flag will have been cleared at the end of the Tx completion processing as we believe we will get an interrupt from the outstanding completion at a later point in time. But if this completion interrupt occurs before interrupts are enable, we lose it and should at that point really have set the need_wakeup flag since there are no more outstanding completions that can generate an interrupt to continue the processing. When this happens, user space will see a Tx queue need_wakeup of 0 and skip issuing a syscall, which means will never get into the Tx processing again and we have a deadlock. This patch introduces a quick fix for this issue by just setting the need_wakeup flag for Tx to 1 all the time. I am working on a proper fix for this that will toggle the flag appropriately, but it is more challenging than I anticipated and I am afraid that this patch will not be completed before the merge window closes, therefore this easier fix for now. This fix has a negative performance impact in the range of 0% to 4%. Towards the higher end of the scale if you have driver and application on the same core and issue a lot of packets, and towards no negative impact if you use two cores, lower transmission speeds and/or a workload that also receives packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-09-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Now that initial BPF backend for gcc has been merged upstream, enable BPF kselftest suite for bpf-gcc. Also fix a BE issue with access to bpf_sysctl.file_pos, from Ilya. 2) Follow-up fix for link-vmlinux.sh to remove bash-specific extensions related to recent work on exposing BTF info through sysfs, from Andrii. 3) AF_XDP zero copy fixes for i40e and ixgbe driver which caused umem headroom to be added twice, from Ciara. 4) Refactoring work to convert sock opt tests into test_progs framework in BPF kselftests, from Stanislav. 5) Fix a general protection fault in dev_map_hash_update_elem(), from Toke. 6) Cleanup to use BPF_PROG_RUN() macro in KCM, from Sami. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16ixgbe: fix xdp handle calculationsCiara Loftus
Commit 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the ixgbe_zca_free, ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11ixgbe: fix double clean of Tx descriptors with xdpIlya Maximets
Tx code doesn't clear the descriptors' status after cleaning. So, if the budget is larger than number of used elems in a ring, some descriptors will be accounted twice and xsk_umem_complete_tx will move prod_tail far beyond the prod_head breaking the completion queue ring. Fix that by limiting the number of descriptors to clean by the number of used descriptors in the Tx ring. 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like 'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use 'next_to_clean' and 'next_to_use' indexes. CC: stable@vger.kernel.org Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-05ixgbe: fix xdp handle calculationsKevin Laatz
Currently, we don't add headroom to the handle in ixgbe_zca_free, ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc. The addition of the headroom to the handle was removed in commit d8c3061e5edd ("ixgbe: modify driver for handling offsets"), which will break things when headroom isvnon-zero. This patch fixes this and uses xsk_umem_adjust_offset to add it appropritely based on the mode being run. Fixes: d8c3061e5edd ("ixgbe: modify driver for handling offsets") Reported-by: Bjorn Topel <bjorn.topel@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-31ixgbe: modify driver for handling offsetsKevin Laatz
With the addition of the unaligned chunks option, we need to make sure we handle the offsets accordingly based on the mode we are currently running in. This patch modifies the driver to appropriately mask the address for each case. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-31ixgbe: simplify Rx buffer recycleKevin Laatz
Currently, the dma, addr and handle are modified when we reuse Rx buffers in zero-copy mode. However, this is not required as the inputs to the function are copies, not the original values themselves. As we use the copies within the function, we can use the original 'obi' values directly without having to mask and add the headroom. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17ixgbe: add support for AF_XDP need_wakeup featureMagnus Karlsson
This patch adds support for the need_wakeup feature of AF_XDP. If the application has told the kernel that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeupMagnus Karlsson
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27xsk: Return the whole xdp_desc from xsk_umem_consume_txMaxim Mikityanskiy
Some drivers want to access the data transmitted in order to implement acceleration features of the NICs. It is also useful in AF_XDP TX flow. Change the xsk_umem_consume_tx API to return the whole xdp_desc, that contains the data pointer, length and DMA address, instead of only the latter two. Adapt the implementation of i40e and ixgbe to this change. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-05ixgbe: fix AF_XDP tx packet countWilliam Tu
The total_packets count at ixgbe_clean_xdp_tx_irq is always zero when testing with xdpsock -t -N. Set the gso_segs to 1 to make the tx packet count correct. Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05ixgbe: fix AF_XDP tx byte countWilliam Tu
The tx bytecount is done twice. When running './xdpsock -t -N -i eth3' and 'ip -s link show dev eth3' The avg packet size is 120 instead of 60. So remove the extra one. Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05ixgbe: remove umem from adapterJan Sokolowski
As current implementation of netdev already contains and provides umems for us, we no longer have the need to contain these structures in ixgbe_adapter. Refactor the code to operate on netdev-provided umems. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05ixgbe: add tracking of AF_XDP zero-copy state for each queue pairJan Sokolowski
Here, we add a bitmap to the ixgbe_adapter that tracks if a certain queue pair has been "zero-copy enabled" via the ndo_bpf. The bitmap is used in ixgbe_xsk_umem, and enables zero-copy if and only if XDP is enabled, the corresponding qid in the bitmap is set, and the umem is non-NULL; Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OKJan Sokolowski
An issue has been found while testing zero-copy XDP that causes a reset to be triggered. As it takes some time to turn the carrier on after setting zc, and we already start trying to transmit some packets, watchdog considers this as an erroneous state and triggers a reset. Don't do any work if netif carrier is not OK. Fixes: 8221c5eba8c13 (ixgbe: add AF_XDP zero-copy Tx support) Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-21ixgbe: fix potential RX buffer starvation for AF_XDPMagnus Karlsson
When the RX rings are created they are also populated with buffers so that packets can be received. Usually these are kernel buffers, but for AF_XDP in zero-copy mode, these are user-space buffers and in this case the application might not have sent down any buffers to the driver at this point. And if no buffers are allocated at ring creation time, no packets can be received and no interrupts will be generated so the NAPI poll function that allocates buffers to the rings will never get executed. To rectify this, we kick the NAPI context of any queue with an attached AF_XDP zero-copy socket in two places in the code. Once after an XDP program has loaded and once after the umem is registered. This take care of both cases: XDP program gets loaded first then AF_XDP socket is created, and the reverse, AF_XDP socket is created first, then XDP program is loaded. Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>