diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-26 16:07:23 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-26 16:07:23 -0700 |
commit | 6e98b09da931a00bf4e0477d0fa52748bf28fcce (patch) | |
tree | 9c658ed95add5693f42f29f63df80a2ede3f6ec2 /drivers/net/ethernet/intel/i40e/i40e_txrx.c | |
parent | b68ee1c6131c540a62ecd443be89c406401df091 (diff) | |
parent | 9b78d919632b7149d311aaad5a977e4b48b10321 (diff) |
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
Diffstat (limited to 'drivers/net/ethernet/intel/i40e/i40e_txrx.c')
-rw-r--r-- | drivers/net/ethernet/intel/i40e/i40e_txrx.c | 422 |
1 files changed, 242 insertions, 180 deletions
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 72b091f2509d..8b8bf4880faa 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1477,9 +1477,6 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring) if (!rx_ring->rx_bi) return; - dev_kfree_skb(rx_ring->skb); - rx_ring->skb = NULL; - if (rx_ring->xsk_pool) { i40e_xsk_clean_rx_ring(rx_ring); goto skip_free; @@ -1524,6 +1521,7 @@ skip_free: rx_ring->next_to_alloc = 0; rx_ring->next_to_clean = 0; + rx_ring->next_to_process = 0; rx_ring->next_to_use = 0; } @@ -1576,6 +1574,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring) rx_ring->next_to_alloc = 0; rx_ring->next_to_clean = 0; + rx_ring->next_to_process = 0; rx_ring->next_to_use = 0; /* XDP RX-queue info only needed for RX rings exposed to XDP */ @@ -1617,21 +1616,19 @@ void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val) writel(val, rx_ring->tail); } +#if (PAGE_SIZE >= 8192) static unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring, unsigned int size) { unsigned int truesize; -#if (PAGE_SIZE < 8192) - truesize = i40e_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */ -#else truesize = rx_ring->rx_offset ? SKB_DATA_ALIGN(size + rx_ring->rx_offset) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : SKB_DATA_ALIGN(size); -#endif return truesize; } +#endif /** * i40e_alloc_mapped_page - recycle or make a new page @@ -1970,7 +1967,6 @@ static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb, * i40e_can_reuse_rx_page - Determine if page can be reused for another Rx * @rx_buffer: buffer containing the page * @rx_stats: rx stats structure for the rx ring - * @rx_buffer_pgcnt: buffer page refcount pre xdp_do_redirect() call * * If page is reusable, we have a green light for calling i40e_reuse_rx_page, * which will assign the current buffer to the buffer that next_to_alloc is @@ -1981,8 +1977,7 @@ static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb, * or busy if it could not be reused. */ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer, - struct i40e_rx_queue_stats *rx_stats, - int rx_buffer_pgcnt) + struct i40e_rx_queue_stats *rx_stats) { unsigned int pagecnt_bias = rx_buffer->pagecnt_bias; struct page *page = rx_buffer->page; @@ -1995,7 +1990,7 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer, #if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ - if (unlikely((rx_buffer_pgcnt - pagecnt_bias) > 1)) { + if (unlikely((rx_buffer->page_count - pagecnt_bias) > 1)) { rx_stats->page_busy_count++; return false; } @@ -2021,33 +2016,14 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer, } /** - * i40e_add_rx_frag - Add contents of Rx buffer to sk_buff - * @rx_ring: rx descriptor ring to transact packets on - * @rx_buffer: buffer containing page to add - * @skb: sk_buff to place the data into - * @size: packet length from rx_desc - * - * This function will add the data contained in rx_buffer->page to the skb. - * It will just attach the page as a frag to the skb. - * - * The function will then update the page offset. + * i40e_rx_buffer_flip - adjusted rx_buffer to point to an unused region + * @rx_buffer: Rx buffer to adjust + * @truesize: Size of adjustment **/ -static void i40e_add_rx_frag(struct i40e_ring *rx_ring, - struct i40e_rx_buffer *rx_buffer, - struct sk_buff *skb, - unsigned int size) +static void i40e_rx_buffer_flip(struct i40e_rx_buffer *rx_buffer, + unsigned int truesize) { #if (PAGE_SIZE < 8192) - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; -#else - unsigned int truesize = SKB_DATA_ALIGN(size + rx_ring->rx_offset); -#endif - - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page, - rx_buffer->page_offset, size, truesize); - - /* page is being used so we must update the page offset */ -#if (PAGE_SIZE < 8192) rx_buffer->page_offset ^= truesize; #else rx_buffer->page_offset += truesize; @@ -2058,19 +2034,17 @@ static void i40e_add_rx_frag(struct i40e_ring *rx_ring, * i40e_get_rx_buffer - Fetch Rx buffer and synchronize data for use * @rx_ring: rx descriptor ring to transact packets on * @size: size of buffer to add to skb - * @rx_buffer_pgcnt: buffer page refcount * * This function will pull an Rx buffer from the ring and synchronize it * for use by the CPU. */ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring, - const unsigned int size, - int *rx_buffer_pgcnt) + const unsigned int size) { struct i40e_rx_buffer *rx_buffer; - rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); - *rx_buffer_pgcnt = + rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_process); + rx_buffer->page_count = #if (PAGE_SIZE < 8192) page_count(rx_buffer->page); #else @@ -2092,25 +2066,82 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring, } /** - * i40e_construct_skb - Allocate skb and populate it + * i40e_put_rx_buffer - Clean up used buffer and either recycle or free * @rx_ring: rx descriptor ring to transact packets on * @rx_buffer: rx buffer to pull data from + * + * This function will clean up the contents of the rx_buffer. It will + * either recycle the buffer or unmap it and free the associated resources. + */ +static void i40e_put_rx_buffer(struct i40e_ring *rx_ring, + struct i40e_rx_buffer *rx_buffer) +{ + if (i40e_can_reuse_rx_page(rx_buffer, &rx_ring->rx_stats)) { + /* hand second half of page back to the ring */ + i40e_reuse_rx_page(rx_ring, rx_buffer); + } else { + /* we are not reusing the buffer so unmap it */ + dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, + i40e_rx_pg_size(rx_ring), + DMA_FROM_DEVICE, I40E_RX_DMA_ATTR); + __page_frag_cache_drain(rx_buffer->page, + rx_buffer->pagecnt_bias); + /* clear contents of buffer_info */ + rx_buffer->page = NULL; + } +} + +/** + * i40e_process_rx_buffs- Processing of buffers post XDP prog or on error + * @rx_ring: Rx descriptor ring to transact packets on + * @xdp_res: Result of the XDP program + * @xdp: xdp_buff pointing to the data + **/ +static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, + struct xdp_buff *xdp) +{ + u32 next = rx_ring->next_to_clean; + struct i40e_rx_buffer *rx_buffer; + + xdp->flags = 0; + + while (1) { + rx_buffer = i40e_rx_bi(rx_ring, next); + if (++next == rx_ring->count) + next = 0; + + if (!rx_buffer->page) + continue; + + if (xdp_res == I40E_XDP_CONSUMED) + rx_buffer->pagecnt_bias++; + else + i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz); + + /* EOP buffer will be put in i40e_clean_rx_irq() */ + if (next == rx_ring->next_to_process) + return; + + i40e_put_rx_buffer(rx_ring, rx_buffer); + } +} + +/** + * i40e_construct_skb - Allocate skb and populate it + * @rx_ring: rx descriptor ring to transact packets on * @xdp: xdp_buff pointing to the data + * @nr_frags: number of buffers for the packet * * This function allocates an skb. It then populates it with the page * data from the current receive descriptor, taking care to set up the * skb correctly. */ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, - struct i40e_rx_buffer *rx_buffer, - struct xdp_buff *xdp) + struct xdp_buff *xdp, + u32 nr_frags) { unsigned int size = xdp->data_end - xdp->data; -#if (PAGE_SIZE < 8192) - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; -#else - unsigned int truesize = SKB_DATA_ALIGN(size); -#endif + struct i40e_rx_buffer *rx_buffer; unsigned int headlen; struct sk_buff *skb; @@ -2150,48 +2181,60 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, memcpy(__skb_put(skb, headlen), xdp->data, ALIGN(headlen, sizeof(long))); + rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); /* update all of the pointers */ size -= headlen; if (size) { + if (unlikely(nr_frags >= MAX_SKB_FRAGS)) { + dev_kfree_skb(skb); + return NULL; + } skb_add_rx_frag(skb, 0, rx_buffer->page, rx_buffer->page_offset + headlen, - size, truesize); - + size, xdp->frame_sz); /* buffer is used by skb, update page_offset */ -#if (PAGE_SIZE < 8192) - rx_buffer->page_offset ^= truesize; -#else - rx_buffer->page_offset += truesize; -#endif + i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz); } else { /* buffer is unused, reset bias back to rx_buffer */ rx_buffer->pagecnt_bias++; } + if (unlikely(xdp_buff_has_frags(xdp))) { + struct skb_shared_info *sinfo, *skinfo = skb_shinfo(skb); + + sinfo = xdp_get_shared_info_from_buff(xdp); + memcpy(&skinfo->frags[skinfo->nr_frags], &sinfo->frags[0], + sizeof(skb_frag_t) * nr_frags); + + xdp_update_skb_shared_info(skb, skinfo->nr_frags + nr_frags, + sinfo->xdp_frags_size, + nr_frags * xdp->frame_sz, + xdp_buff_is_frag_pfmemalloc(xdp)); + + /* First buffer has already been processed, so bump ntc */ + if (++rx_ring->next_to_clean == rx_ring->count) + rx_ring->next_to_clean = 0; + + i40e_process_rx_buffs(rx_ring, I40E_XDP_PASS, xdp); + } + return skb; } /** * i40e_build_skb - Build skb around an existing buffer * @rx_ring: Rx descriptor ring to transact packets on - * @rx_buffer: Rx buffer to pull data from * @xdp: xdp_buff pointing to the data + * @nr_frags: number of buffers for the packet * * This function builds an skb around an existing Rx buffer, taking care * to set up the skb correctly and avoid any memcpy overhead. */ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, - struct i40e_rx_buffer *rx_buffer, - struct xdp_buff *xdp) + struct xdp_buff *xdp, + u32 nr_frags) { unsigned int metasize = xdp->data - xdp->data_meta; -#if (PAGE_SIZE < 8192) - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; -#else - unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + - SKB_DATA_ALIGN(xdp->data_end - - xdp->data_hard_start); -#endif struct sk_buff *skb; /* Prefetch first cache line of first page. If xdp->data_meta @@ -2202,7 +2245,7 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, net_prefetch(xdp->data_meta); /* build an skb around the page buffer */ - skb = napi_build_skb(xdp->data_hard_start, truesize); + skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz); if (unlikely(!skb)) return NULL; @@ -2212,42 +2255,25 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, if (metasize) skb_metadata_set(skb, metasize); - /* buffer is used by skb, update page_offset */ -#if (PAGE_SIZE < 8192) - rx_buffer->page_offset ^= truesize; -#else - rx_buffer->page_offset += truesize; -#endif + if (unlikely(xdp_buff_has_frags(xdp))) { + struct skb_shared_info *sinfo; - return skb; -} + sinfo = xdp_get_shared_info_from_buff(xdp); + xdp_update_skb_shared_info(skb, nr_frags, + sinfo->xdp_frags_size, + nr_frags * xdp->frame_sz, + xdp_buff_is_frag_pfmemalloc(xdp)); -/** - * i40e_put_rx_buffer - Clean up used buffer and either recycle or free - * @rx_ring: rx descriptor ring to transact packets on - * @rx_buffer: rx buffer to pull data from - * @rx_buffer_pgcnt: rx buffer page refcount pre xdp_do_redirect() call - * - * This function will clean up the contents of the rx_buffer. It will - * either recycle the buffer or unmap it and free the associated resources. - */ -static void i40e_put_rx_buffer(struct i40e_ring *rx_ring, - struct i40e_rx_buffer *rx_buffer, - int rx_buffer_pgcnt) -{ - if (i40e_can_reuse_rx_page(rx_buffer, &rx_ring->rx_stats, rx_buffer_pgcnt)) { - /* hand second half of page back to the ring */ - i40e_reuse_rx_page(rx_ring, rx_buffer); + i40e_process_rx_buffs(rx_ring, I40E_XDP_PASS, xdp); } else { - /* we are not reusing the buffer so unmap it */ - dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, - i40e_rx_pg_size(rx_ring), - DMA_FROM_DEVICE, I40E_RX_DMA_ATTR); - __page_frag_cache_drain(rx_buffer->page, - rx_buffer->pagecnt_bias); - /* clear contents of buffer_info */ - rx_buffer->page = NULL; + struct i40e_rx_buffer *rx_buffer; + + rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); + /* buffer is used by skb, update page_offset */ + i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz); } + + return skb; } /** @@ -2333,25 +2359,6 @@ xdp_out: } /** - * i40e_rx_buffer_flip - adjusted rx_buffer to point to an unused region - * @rx_ring: Rx ring - * @rx_buffer: Rx buffer to adjust - * @size: Size of adjustment - **/ -static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring, - struct i40e_rx_buffer *rx_buffer, - unsigned int size) -{ - unsigned int truesize = i40e_rx_frame_truesize(rx_ring, size); - -#if (PAGE_SIZE < 8192) - rx_buffer->page_offset ^= truesize; -#else - rx_buffer->page_offset += truesize; -#endif -} - -/** * i40e_xdp_ring_update_tail - Updates the XDP Tx ring tail register * @xdp_ring: XDP Tx ring * @@ -2409,16 +2416,65 @@ void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res) } /** - * i40e_inc_ntc: Advance the next_to_clean index + * i40e_inc_ntp: Advance the next_to_process index * @rx_ring: Rx ring **/ -static void i40e_inc_ntc(struct i40e_ring *rx_ring) +static void i40e_inc_ntp(struct i40e_ring *rx_ring) +{ + u32 ntp = rx_ring->next_to_process + 1; + + ntp = (ntp < rx_ring->count) ? ntp : 0; + rx_ring->next_to_process = ntp; + prefetch(I40E_RX_DESC(rx_ring, ntp)); +} + +/** + * i40e_add_xdp_frag: Add a frag to xdp_buff + * @xdp: xdp_buff pointing to the data + * @nr_frags: return number of buffers for the packet + * @rx_buffer: rx_buffer holding data of the current frag + * @size: size of data of current frag + */ +static int i40e_add_xdp_frag(struct xdp_buff *xdp, u32 *nr_frags, + struct i40e_rx_buffer *rx_buffer, u32 size) { - u32 ntc = rx_ring->next_to_clean + 1; + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); + + if (!xdp_buff_has_frags(xdp)) { + sinfo->nr_frags = 0; + sinfo->xdp_frags_size = 0; + xdp_buff_set_frags_flag(xdp); + } else if (unlikely(sinfo->nr_frags >= MAX_SKB_FRAGS)) { + /* Overflowing packet: All frags need to be dropped */ + return -ENOMEM; + } + + __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buffer->page, + rx_buffer->page_offset, size); + + sinfo->xdp_frags_size += size; - ntc = (ntc < rx_ring->count) ? ntc : 0; - rx_ring->next_to_clean = ntc; - prefetch(I40E_RX_DESC(rx_ring, ntc)); + if (page_is_pfmemalloc(rx_buffer->page)) + xdp_buff_set_frag_pfmemalloc(xdp); + *nr_frags = sinfo->nr_frags; + + return 0; +} + +/** + * i40e_consume_xdp_buff - Consume all the buffers of the packet and update ntc + * @rx_ring: rx descriptor ring to transact packets on + * @xdp: xdp_buff pointing to the data + * @rx_buffer: rx_buffer of eop desc + */ +static void i40e_consume_xdp_buff(struct i40e_ring *rx_ring, + struct xdp_buff *xdp, + struct i40e_rx_buffer *rx_buffer) +{ + i40e_process_rx_buffs(rx_ring, I40E_XDP_CONSUMED, xdp); + i40e_put_rx_buffer(rx_ring, rx_buffer); + rx_ring->next_to_clean = rx_ring->next_to_process; + xdp->data = NULL; } /** @@ -2437,38 +2493,36 @@ static void i40e_inc_ntc(struct i40e_ring *rx_ring) static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, unsigned int *rx_cleaned) { - unsigned int total_rx_bytes = 0, total_rx_packets = 0, frame_sz = 0; + unsigned int total_rx_bytes = 0, total_rx_packets = 0; u16 cleaned_count = I40E_DESC_UNUSED(rx_ring); + u16 clean_threshold = rx_ring->count / 2; unsigned int offset = rx_ring->rx_offset; - struct sk_buff *skb = rx_ring->skb; + struct xdp_buff *xdp = &rx_ring->xdp; unsigned int xdp_xmit = 0; struct bpf_prog *xdp_prog; bool failure = false; - struct xdp_buff xdp; int xdp_res = 0; -#if (PAGE_SIZE < 8192) - frame_sz = i40e_rx_frame_truesize(rx_ring, 0); -#endif - xdp_init_buff(&xdp, frame_sz, &rx_ring->xdp_rxq); - xdp_prog = READ_ONCE(rx_ring->xdp_prog); while (likely(total_rx_packets < (unsigned int)budget)) { + u16 ntp = rx_ring->next_to_process; struct i40e_rx_buffer *rx_buffer; union i40e_rx_desc *rx_desc; - int rx_buffer_pgcnt; + struct sk_buff *skb; unsigned int size; + u32 nfrags = 0; + bool neop; u64 qword; /* return some buffers to hardware, one at a time is too slow */ - if (cleaned_count >= I40E_RX_BUFFER_WRITE) { + if (cleaned_count >= clean_threshold) { failure = failure || i40e_alloc_rx_buffers(rx_ring, cleaned_count); cleaned_count = 0; } - rx_desc = I40E_RX_DESC(rx_ring, rx_ring->next_to_clean); + rx_desc = I40E_RX_DESC(rx_ring, ntp); /* status_error_len will always be zero for unused descriptors * because it's cleared in cleanup, and overlaps with hdr_addr @@ -2487,8 +2541,8 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, i40e_clean_programming_status(rx_ring, rx_desc->raw.qword[0], qword); - rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); - i40e_inc_ntc(rx_ring); + rx_buffer = i40e_rx_bi(rx_ring, ntp); + i40e_inc_ntp(rx_ring); i40e_reuse_rx_page(rx_ring, rx_buffer); cleaned_count++; continue; @@ -2499,76 +2553,84 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, if (!size) break; - i40e_trace(clean_rx_irq, rx_ring, rx_desc, skb); - rx_buffer = i40e_get_rx_buffer(rx_ring, size, &rx_buffer_pgcnt); - + i40e_trace(clean_rx_irq, rx_ring, rx_desc, xdp); /* retrieve a buffer from the ring */ - if (!skb) { + rx_buffer = i40e_get_rx_buffer(rx_ring, size); + + neop = i40e_is_non_eop(rx_ring, rx_desc); + i40e_inc_ntp(rx_ring); + + if (!xdp->data) { unsigned char *hard_start; hard_start = page_address(rx_buffer->page) + rx_buffer->page_offset - offset; - xdp_prepare_buff(&xdp, hard_start, offset, size, true); - xdp_buff_clear_frags_flag(&xdp); + xdp_prepare_buff(xdp, hard_start, offset, size, true); #if (PAGE_SIZE > 4096) /* At larger PAGE_SIZE, frame_sz depend on len size */ - xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size); + xdp->frame_sz = i40e_rx_frame_truesize(rx_ring, size); #endif - xdp_res = i40e_run_xdp(rx_ring, &xdp, xdp_prog); + } else if (i40e_add_xdp_frag(xdp, &nfrags, rx_buffer, size) && + !neop) { + /* Overflowing packet: Drop all frags on EOP */ + i40e_consume_xdp_buff(rx_ring, xdp, rx_buffer); + break; } + if (neop) + continue; + + xdp_res = i40e_run_xdp(rx_ring, xdp, xdp_prog); + if (xdp_res) { - if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { - xdp_xmit |= xdp_res; - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); + xdp_xmit |= xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR); + + if (unlikely(xdp_buff_has_frags(xdp))) { + i40e_process_rx_buffs(rx_ring, xdp_res, xdp); + size = xdp_get_buff_len(xdp); + } else if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { + i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz); } else { rx_buffer->pagecnt_bias++; } total_rx_bytes += size; - total_rx_packets++; - } else if (skb) { - i40e_add_rx_frag(rx_ring, rx_buffer, skb, size); - } else if (ring_uses_build_skb(rx_ring)) { - skb = i40e_build_skb(rx_ring, rx_buffer, &xdp); } else { - skb = i40e_construct_skb(rx_ring, rx_buffer, &xdp); - } + if (ring_uses_build_skb(rx_ring)) + skb = i40e_build_skb(rx_ring, xdp, nfrags); + else + skb = i40e_construct_skb(rx_ring, xdp, nfrags); + + /* drop if we failed to retrieve a buffer */ + if (!skb) { + rx_ring->rx_stats.alloc_buff_failed++; + i40e_consume_xdp_buff(rx_ring, xdp, rx_buffer); + break; + } - /* exit if we failed to retrieve a buffer */ - if (!xdp_res && !skb) { - rx_ring->rx_stats.alloc_buff_failed++; - rx_buffer->pagecnt_bias++; - break; - } + if (i40e_cleanup_headers(rx_ring, skb, rx_desc)) + goto process_next; - i40e_put_rx_buffer(rx_ring, rx_buffer, rx_buffer_pgcnt); - cleaned_count++; + /* probably a little skewed due to removing CRC */ + total_rx_bytes += skb->len; - i40e_inc_ntc(rx_ring); - if (i40e_is_non_eop(rx_ring, rx_desc)) - continue; + /* populate checksum, VLAN, and protocol */ + i40e_process_skb_fields(rx_ring, rx_desc, skb); - if (xdp_res || i40e_cleanup_headers(rx_ring, skb, rx_desc)) { - skb = NULL; - continue; + i40e_trace(clean_rx_irq_rx, rx_ring, rx_desc, xdp); + napi_gro_receive(&rx_ring->q_vector->napi, skb); } - /* probably a little skewed due to removing CRC */ - total_rx_bytes += skb->len; - - /* populate checksum, VLAN, and protocol */ - i40e_process_skb_fields(rx_ring, rx_desc, skb); - - i40e_trace(clean_rx_irq_rx, rx_ring, rx_desc, skb); - napi_gro_receive(&rx_ring->q_vector->napi, skb); - skb = NULL; - /* update budget accounting */ total_rx_packets++; +process_next: + cleaned_count += nfrags + 1; + i40e_put_rx_buffer(rx_ring, rx_buffer); + rx_ring->next_to_clean = rx_ring->next_to_process; + + xdp->data = NULL; } i40e_finalize_xdp_rx(rx_ring, xdp_xmit); - rx_ring->skb = skb; i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); @@ -3001,7 +3063,7 @@ static inline int i40e_tx_prepare_vlan_flags(struct sk_buff *skb, rc = skb_cow_head(skb, 0); if (rc < 0) return rc; - vhdr = (struct vlan_ethhdr *)skb->data; + vhdr = skb_vlan_eth_hdr(skb); vhdr->h_vlan_TCI = htons(tx_flags >> I40E_TX_FLAGS_VLAN_SHIFT); } else { |