summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)Author
2022-02-08Merge branch 'iwl-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Nguyen, Anthony L says: ==================== iwl-next Intel Wired LAN Driver Updates 2022-02-07 Dave adds support for ice driver to provide DSCP QoS mappings to irdma driver. [1] https://lore.kernel.org/netdev/20220202191921.1638-1-shiraz.saleem@intel.com/ * 'iwl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: ice: add support for DSCP QoS for IDC ==================== Link: https://lore.kernel.org/r/20220207235921.1303522-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03ice: add support for DSCP QoS for IDCDave Ertman
The ice driver provides QoS information to auxiliary drivers through the exported function ice_get_qos_params. This function doesn't currently support L3 DSCP QoS. Add the necessary defines, structure elements and code to support DSCP QoS through the IIDC functions. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-31ice: respect metadata on XSK Rx to skbAlexander Lobakin
For now, if the XDP prog returns XDP_PASS on XSK, the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match ice_construct_skb(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-31ice: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skbAlexander Lobakin
{__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, ice_construct_skb_zc() currently allocates and reserves additional `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-31ice: respect metadata in legacy-rx/ice_construct_skb()Alexander Lobakin
In "legacy-rx" mode represented by ice_construct_skb(), we can still use XDP (and XDP metadata), but after XDP_PASS the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. Point net_prefetch() to xdp->data_meta instead of data. This won't change anything when the meta is not here, but will save some cache misses otherwise. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-27ice: Remove useless DMA-32 fallback configurationChristophe JAILLET
As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32 bits case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lkml.org/lkml/2021/6/7/398 Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: Use bitmap_free() to free bitmapChristophe JAILLET
kfree() and bitmap_free() are the same. But using the latter is more consistent when freeing memory allocated with bitmap_zalloc(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: Optimize a few bitmap operationsChristophe JAILLET
When a bitmap is local to a function, it is safe to use the non-atomic __[set|clear]_bit(). No concurrent accesses can occur. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: Slightly simply ice_find_free_recp_res_idxChristophe JAILLET
The 'possible_idx' bitmap is set just after it is zeroed, so we can save the first step. The 'free_idx' bitmap is used only at the end of the function as the result of a bitmap xor operation. So there is no need to explicitly zero it before. So, slightly simply the code and remove 2 useless 'bitmap_zero()' call Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: improve switchdev's slow-pathWojciech Drewek
In current switchdev implementation, every VF PR is assigned to individual ring on switchdev ctrl VSI. For slow-path traffic, there is a mapping VF->ring done in software based on src_vsi value (by calling ice_eswitch_get_target_netdev function). With this change, HW solution is introduced which is more efficient. For each VF, src MAC (VF's MAC) filter will be created, which forwards packets to the corresponding switchdev ctrl VSI queue based on src MAC address. This filter has to be removed and then replayed in case of resetting one VF. Keep information about this rule in repr->mac_rule, thanks to that we know which rule has to be removed and replayed for a given VF. In case of CORE/GLOBAL all rules are removed automatically. We have to take care of readding them. This is done by ice_replay_vsi_adv_rule. When driver leaves switchdev mode, remove all advanced rules from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi. Flag repr->rule_added is needed because in some cases reset might be triggered before VF sends request to add MAC. Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: replay advanced rules after resetVictor Raj
ice_replay_vsi_adv_rule will replay advanced rules for a given VSI. Exit this function when list of rules for given recipe is empty. Do not add rule when given vsi_handle does not match vsi_handle from the rule info. Use ICE_MAX_NUM_RECIPES instead of ICE_SW_LKUP_LAST in order to find advanced rules as well. Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-04net: fixup build after bpf header changesJakub Kicinski
Recent bpf-next merge brought in header changes which uncovered includes missing in net-next which were not present in bpf-next. Build problems happen only on less-popular arches like hppa, sparc, alpha etc. I could repro the build problem with ice but not the mlx5 problem Abdul was reporting. mlx5 does look like it should include filter.h, anyway. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Fixes: e63a02348958 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next") Link: https://lore.kernel.org/all/7c03768d-d948-c935-a7ab-b1f963ac7eed@linux.vnet.ibm.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30ice: Add flow director support for channel modeKiran Patil
Add support to enable flow-director filter when multiple TCs are configured. Flow director filter can be configured using ethtool (--config-ntuple option). When multiple TCs are configured, each TC is mapped to an unique HW VSI. So VSI corresponding to queue used in filter is identified and flow director context is updated with correct VSI while configuring ntuple filter in HW. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-28ice: switch to napi_build_skb()Alexander Lobakin
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. ice driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h commit 8f905c0e7354 ("inet: fully convert sk->sk_rx_dst to RCU rules") commit 43f51df41729 ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22ice: trivial: fix odd indentingJesse Brandeburg
Fix an odd indent where some code was left indented, and causes smatch to warn: ice_log_pkg_init() warn: inconsistent indenting While here, for consistency, add a break after the default case. This commit has a Fixes: but we caught this while it was only in net-next. Fixes: 247dd97d713c ("ice: Refactor status flow for DDP load") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20211221230538.2546315-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-21ice: support crosstimestamping on E822 devices if supportedJacob Keller
E822 devices on supported platforms can generate a cross timestamp between the platform ART and the device time. This process allows for very precise measurement of the difference between the PTP hardware clock and the platform time. This is only supported if we know the TSC frequency relative to ART, so we do not enable this unless the boot CPU has a known TSC frequency (as required by convert_art_ns_to_tsc). Because PCIe PTM support is not available on all platforms, introduce CONFIG_ICE_HWTS and make it depend on X86 where we know the support exists. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: exit bypass mode once hardware finishes timestamp calibrationJacob Keller
Once the E822 device has sent and received one packet, the hardware computes the internal delay of the PHY using a process known as Vernier calibration. This calibration calculates a more accurate offset for the Tx and Rx timestamps. To make use of this offset, we need to exit the bypass mode. This cannot be done until the PHY has completed offset calibration, as indicated by the offset valid bits. To handle this, introduce a kthread work item which will poll the offset valid bits every few milliseconds seeing if it is safe to exit bypass mode. Once we have finished calibrating the offsets, we can program the total Tx and Rx offset registers and turn off the bypass bit. This allows the hardware to include the more precise vernier calibration offset, and improves the timestamp precision. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: ensure the hardware Clock Generation Unit is configuredJacob Keller
The E822 device has a Clock Generation Unit (CGU) responsible for determining the clock frequency that drives the timers. Ensure this function is initialized when bringing up the PTP support, so that the clock has a known frequency. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: implement basic E822 PTP supportJacob Keller
Implement support for the basic operations needed to enable the PTP hardware clock on E822 devices. This includes implementations for the various PHY access functions, as well as the ability to start and stop the PHY timers. This is different from the E810 device because the configuration depends on link speed, so we cannot just start the PHYs immediately. We must wait until the link is up to get proper values for the speed based initialization. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: convert clk_freq capability into time_refJacob Keller
Convert the clk_freq value into the associated time_ref frequency value for E822 devices. This simplifies determining the time reference value for the clock. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: introduce ice_ptp_init_phc functionJacob Keller
When we enable support for E822 devices, there are some additional steps required to initialize the PTP hardware clock. To make this easier to implement as device-specific behavior, refactor the register setups in ice_ptp_init_owner to a new ice_ptp_init_phc function defined in ice_ptp_hw.c This function will have a common section, and an e810 specific sub-implementation. This will enable easily extending the functionality to cover the E822 specific setup required to initialize the hardware clock generation unit. It also makes it clear which steps are E810 specific vs which ones are necessary for all ice devices. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: use 'int err' instead of 'int status' in ice_ptp_hw.cJacob Keller
The ice_ptp_hw.c file introduced a bunch of uses of "int status" instead of the more traditional "int err" or "int ret". These are actually traditional Linux error codes (as opposed to the recently removed ice_status enumeration values). We're about to add a bunch of new functions to ice_ptp_hw.c. It's normally preferred in the ice driver to use "int ret" or "int err" when dealing with error code values. Instead of making the new functions use "int status" lets just fix all of ice_ptp_hw.c to use "int err". This will match the new functions and ensures a consistent style across at least the PTP related files. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: PTP: move setting of tstamp_configJacob Keller
The tstamp_config structure is being set inside of ice_ptp_cfg_timestamp, which is the function used to set Tx and Rx timestamping during initialization. This function is also used in order to set the PHY port timestamping status. However, it makes sense to always set the tstamp_config directly whenever the ice_set_tx_tstamp or ice_set_rx_tstamp functions are called. Move assignment of tstamp_config into the related functions and out of ice_ptp_cfg_timestamp. Now that we assign the timestamp mode in the relevant functions, we no longer modify the config value in ice_set_timestamp_mode. In turn, we no longer want to copy that config value into the PF cached structure. Instead, this is now the source of truth for actual configuration. On success of ice_set_timestamp_mode, copy the real configured mode back to report it out to userspace. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: introduce ice_base_incval functionJacob Keller
A future change will add additional possible increment values for the E822 device support. To handle this, we want to look up the increment value to use instead of hard coding it to the nominal value for E810 devices. Introduce ice_base_incval as a function to get the best nominal increment value to use. For now, it just returns the E810 value, but will be refactored in the future to look up the value based on the device type and configured clock frequency. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-21ice: Fix E810 PTP reset flowKarol Kolacinski
The PF reset does not reset PHC and PHY clocks so it's unnecessary to stop them and reinitialize after the reset. Configuring timestamping changes the VSI fields so it needs to be performed after VSIs are initialized, which was not done in case of a reset. Suggested-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Pasi Vaananen <pvaanane@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17ice: xsk: fix cleaned_count settingMaciej Fijalkowski
Currently cleaned_count is initialized to ICE_DESC_UNUSED(rx_ring) and later on during the Rx processing it is incremented per each frame that driver consumed. This can result in excessive buffers requested from xsk pool based on that value. To address this, just drop cleaned_count and pass ICE_DESC_UNUSED(rx_ring) directly as a function argument to ice_alloc_rx_bufs_zc(). Idea is to ask for buffers as many as consumed. Let us also call ice_alloc_rx_bufs_zc unconditionally at the end of ice_clean_rx_irq_zc. This has been changed in that way for corresponding ice_clean_rx_irq, but not here. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17ice: xsk: allow empty Rx descriptors on XSK ZC data pathMaciej Fijalkowski
Commit ac6f733a7bd5 ("ice: allow empty Rx descriptors") stated that ice HW can produce empty descriptors that are valid and they should be processed. Add this support to xsk ZC path to avoid potential processing problems. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17ice: xsk: do not clear status_error0 for ntu + nb_buffs descriptorMaciej Fijalkowski
The descriptor that ntu is pointing at when we exit ice_alloc_rx_bufs_zc() should not have its corresponding DD bit cleared as descriptor is not allocated in there and it is not valid for HW usage. The allocation routine at the entry will fill the descriptor that ntu points to after it was set to ntu + nb_buffs on previous call. Even the spec says: "The tail pointer should be set to one descriptor beyond the last empty descriptor in host descriptor ring." Therefore, step away from clearing the status_error0 on ntu + nb_buffs descriptor. Fixes: db804cfc21e9 ("ice: Use the xsk batched rx allocation interface") Reported-by: Elza Mathew <elza.mathew@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17ice: remove dead store on XSK hotpathAlexander Lobakin
The 'if (ntu == rx_ring->count)' block in ice_alloc_rx_buffers_zc() was previously residing in the loop, but after introducing the batched interface it is used only to wrap-around the NTU descriptor, thus no more need to assign 'xdp'. Fixes: db804cfc21e9 ("ice: Use the xsk batched rx allocation interface") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17ice: xsk: allocate separate memory for XDP SW ringMaciej Fijalkowski
Currently, the zero-copy data path is reusing the memory region that was initially allocated for an array of struct ice_rx_buf for its own purposes. This is error prone as it is based on the ice_rx_buf struct always being the same size or bigger than what the zero-copy path needs. There can also be old values present in that array giving rise to errors when the zero-copy path uses it. Fix this by freeing the ice_rx_buf region and allocating a new array for the zero-copy path that has the right length and is initialized to zero. Fixes: 57f7f8b6bc0b ("ice: Use xdp_buf instead of rx_buf for xsk zero-copy") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17ice: xsk: return xsk buffers back to pool when cleaning the ringMaciej Fijalkowski
Currently we only NULL the xdp_buff pointer in the internal SW ring but we never give it back to the xsk buffer pool. This means that buffers can be leaked out of the buff pool and never be used again. Add missing xsk_buff_free() call to the routine that is supposed to clean the entries that are left in the ring so that these buffers in the umem can be used by other sockets. Also, only go through the space that is actually left to be cleaned instead of a whole ring. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15ice: use modern kernel API for kickJesse Brandeburg
The kernel gained a new interface for drivers to use to combine tail bump (doorbell) and BQL updates, attempt to use those new interfaces. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: tighter control over VSI_DOWN stateJesse Brandeburg
The driver had comments to the effect of: This flag should be set before calling this function. While reviewing code it was found that there were several violations of this policy, which could introduce hard to find bugs or races. Fix the violations of the "VSI DOWN state must be set before calling ice_down" and make checking the state into code with a WARN_ON. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: use prefetch methodsJesse Brandeburg
The kernel provides some prefetch mechanisms to speed up commonly cold cache line accesses during receive processing. Since these are software structures it helps to have these strategically placed prefetches. Be careful to call BQL prefetch complete only for non XDP queues. Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: update to newer kernel APIJesse Brandeburg
Use the netif_tx_* API from netdevice.h which has simpler parameters. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: support immediate firmware activation via devlink reloadJacob Keller
The ice hardware contains an embedded chip with firmware which can be updated using devlink flash. The firmware which runs on this chip is referred to as the Embedded Management Processor firmware (EMP firmware). Activating the new firmware image currently requires that the system be rebooted. This is not ideal as rebooting the system can cause unwanted downtime. In practical terms, activating the firmware does not always require a full system reboot. In many cases it is possible to activate the EMP firmware immediately. There are a couple of different scenarios to cover. * The EMP firmware itself can be reloaded by issuing a special update to the device called an Embedded Management Processor reset (EMP reset). This reset causes the device to reset and reload the EMP firmware. * PCI configuration changes are only reloaded after a cold PCIe reset. Unfortunately there is no generic way to trigger this for a PCIe device without a system reboot. When performing a flash update, firmware is capable of responding with some information about the specific update requirements. The driver updates the flash by programming a secondary inactive bank with the contents of the new image, and then issuing a command to request to switch the active bank starting from the next load. The response to the final command for updating the inactive NVM flash bank includes an indication of the minimum reset required to fully update the device. This can be one of the following: * A full power on is required * A cold PCIe reset is required * An EMP reset is required The response to the command to switch flash banks includes an indication of whether or not the firmware will allow an EMP reset request. For most updates, an EMP reset is sufficient to load the new EMP firmware without issues. In some cases, this reset is not sufficient because the PCI configuration space has changed. When this could cause incompatibility with the new EMP image, the firmware is capable of rejecting the EMP reset request. Add logic to ice_fw_update.c to handle the response data flash update AdminQ commands. For the reset level, issue a devlink status notification informing the user of how to complete the update with a simple suggestion like "Activate new firmware by rebooting the system". Cache the status of whether or not firmware will restrict the EMP reset for use in implementing devlink reload. Implement support for devlink reload with the "fw_activate" flag. This allows user space to request the firmware be activated immediately. For the .reload_down handler, we will issue a request for the EMP reset using the appropriate firmware AdminQ command. If we know that the firmware will not allow an EMP reset, simply exit with a suitable netlink extended ACK message indicating that the EMP reset is not available. For the .reload_up handler, simply wait until the driver has finished resetting. Logic to handle processing of an EMP reset already exists in the driver as part of its reset and rebuild flows. Implement support for the devlink reload interface with the "fw_activate" action. This allows userspace to request activation of firmware without a reboot. Note that support for indicating the required reset and EMP reset restriction is not supported on old versions of firmware. The driver can determine if the two features are supported by checking the device capabilities report. I confirmed support has existed since at least version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the EMP reset request has existed in all version of the EMP firmware for the ice hardware. Check the device capabilities report to determine whether or not the indications are reported by the running firmware. If the reset requirement indication is not supported, always assume a full power on is necessary. If the reset restriction capability is not supported, always assume the EMP reset is available. Users can verify if the EMP reset has activated the firmware by using the devlink info report to check that the 'running' firmware version has updated. For example a user might do the following: # Check current version $ devlink dev info # Update the device $ devlink dev flash pci/0000:af:00.0 file firmware.bin # Confirm stored version updated $ devlink dev info # Reload to activate new firmware $ devlink dev reload pci/0000:af:00.0 action fw_activate # Confirm running version updated $ devlink dev info Finally, this change does *not* implement basic driver-only reload support. I did look into trying to do this. However, it requires significant refactor of how the ice driver probes and loads everything. The ice driver probe and allocation flows were not designed with such a reload in mind. Refactoring the flow to support this is beyond the scope of this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: reduce time to read Option ROM CIVD dataJacob Keller
During probe and device reset, the ice driver reads some data from the NVM image as part of ice_init_nvm. Part of this data includes a section of the Option ROM which contains version information. The function ice_get_orom_civd_data is used to locate the '$CIV' data section of the Option ROM. Timing of ice_probe and ice_rebuild indicate that the ice_get_orom_civd_data function takes about 10 seconds to finish executing. The function locates the section by scanning the Option ROM every 512 bytes. This requires a significant number of NVM read accesses, since the Option ROM bank is 500KB. In the worst case it would take about 1000 reads. Worse, all PFs serialize this operation during reload because of acquiring the NVM semaphore. The CIVD section is located at the end of the Option ROM image data. Unfortunately, the driver has no easy method to determine the offset manually. Practical experiments have shown that the data could be at a variety of locations, so simply reversing the scanning order is not sufficient to reduce the overall read time. Instead, copy the entire contents of the Option ROM into memory. This allows reading the data using 4Kb pages instead of 512 bytes at a time. This reduces the total number of firmware commands by a factor of 8. In addition, reading the whole section together at once allows better indication to firmware of when we're "done". Re-write ice_get_orom_civd_data to allocate virtual memory to store the Option ROM data. Copy the entire OptionROM contents at once using ice_read_flash_module. Finally, use this memory copy to scan for the '$CIV' section. This change significantly reduces the time to read the Option ROM CIVD section from ~10 seconds down to ~1 second. This has a significant impact on the total time to complete a driver rebuild or probe. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: move ice_devlink_flash_update and merge with ice_flash_pldm_imageJacob Keller
The ice_devlink_flash_update function performs a few upfront checks and then calls ice_flash_pldm_image. Most if these checks make more sense in the context of code within ice_flash_pldm_image. Merge ice_devlink_flash_update and ice_flash_pldm_image into one function, placing it in ice_fw_update.c Since this is still the entry point for devlink, call the function ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a single function which handles the devlink parameters and then initiates a PLDM update. With this change, the ice_devlink_flash_update function in ice_fw_update.c becomes the main entry point for flash update. It elimintes some unnecessary boiler plate code between the two previous functions. The ultimate motivation for this is that it eases supporting a dry run with the PLDM library in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: move and rename ice_check_for_pending_updateJacob Keller
The ice_devlink_flash_update function performs a few checks and then calls ice_flash_pldm_image. One of these checks is to call ice_check_for_pending_update. This function checks if the device has a pending update, and cancels it if so. This is necessary to allow a new flash update to proceed. We want to refactor the ice code to eliminate ice_devlink_flash_update, moving its checks into ice_flash_pldm_image. To do this, ice_check_for_pending_update will become static, and only called by ice_flash_pldm_image. To make this change easier to review, first just move the function up within the ice_fw_update.c file. While at it, note that the function has a misleading name. Its primary action is to cancel a pending update. Using the verb "check" does not imply this. Rename it to ice_cancel_pending_update. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: devlink: add shadow-ram region to snapshot Shadow RAMJacob Keller
We have a region for reading the contents of the NVM flash as a snapshot. This region does not allow reading the Shadow RAM, as it always passes the FLASH_ONLY bit to the low level firmware interface. Add a separate shadow-ram region which will allow snapshot of the current contents of the Shadow RAM. This data is built from the NVM contents but is distinct as the device builds up the Shadow RAM during initialization, so being able to snapshot its contents can be useful when attempting to debug flash related issues. Fix the comment description of the nvm-flash region which incorrectly stated that it filled the shadow-ram region, and add a comment explaining that the nvm-flash region does not actually read the Shadow RAM. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-14ice: Don't put stale timestamps in the skbKarol Kolacinski
The driver has to check if it does not accidentally put the timestamp in the SKB before previous timestamp gets overwritten. Timestamp values in the PHY are read only and do not get cleared except at hardware reset or when a new timestamp value is captured. The cached_tstamp field is used to detect the case where a new timestamp has not yet been captured, ensuring that we avoid sending stale timestamp data to the stack. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-14ice: Use div64_u64 instead of div_u64 in adjfineKarol Kolacinski
Change the division in ice_ptp_adjfine from div_u64 to div64_u64. div_u64 is used when the divisor is 32 bit but in this case incval is 64 bit and it caused incorrect calculations and incval adjustments. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-14ice: Remove unused ICE_FLOW_SEG_HDRS_L2_MASKTony Nguyen
Remove the unused define ICE_FLOW_SEG_HDRS_L2_MASK. Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Remove unnecessary castsDan Carpenter
The "bitmap" variable is already an unsigned long so there is no need for this cast. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-14ice: Propagate error codesTony Nguyen
As all functions now return standard error codes, propagate the values being returned instead of converting them to generic values. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Remove excess error variablesTony Nguyen
ice_status previously had a variable to contain these values where other error codes had a variable as well. With ice_status now being an int, there is no need for two variables to hold error values. In cases where this occurs, remove one of the excess variables and use a single one. Some initialization of variables are no longer needed and have been removed. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>