summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice_main.c
AgeCommit message (Collapse)Author
2023-02-03ice: move RDMA init to ice_idc.cMichal Swiatkowski
Simplify probe flow by moving all RDMA related code to ice_init_rdma(). Unroll irq allocation if RDMA initialization fails. Implement ice_deinit_rdma() and use it in remove flow. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-02drivers: net: turn on XDP featuresMarek Majtyka
A summary of the flags being set for various drivers is given below. Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features that can be turned off and on at runtime. This means that these flags may be set and unset under RTNL lock protection by the driver. Hence, READ_ONCE must be used by code loading the flag value. Also, these flags are not used for synchronization against the availability of XDP resources on a device. It is merely a hint, and hence the read may race with the actual teardown of XDP resources on the device. This may change in the future, e.g. operations taking a reference on the XDP resources of the driver, and in turn inhibiting turning off this flag. However, for now, it can only be used as a hint to check whether device supports becoming a redirection target. Turn 'hw-offload' feature flag on for: - netronome (nfp) - netdevsim. Turn 'native' and 'zerocopy' features flags on for: - intel (i40e, ice, ixgbe, igc) - mellanox (mlx5). - stmmac - netronome (nfp) Turn 'native' features flags on for: - amazon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2, enetc) - funeth - intel (igb) - marvell (mvneta, mvpp2, octeontx2) - mellanox (mlx4) - mtk_eth_soc - qlogic (qede) - sfc - socionext (netsec) - ti (cpsw) - tap - tsnep - veth - xen - virtio_net. Turn 'basic' (tx, pass, aborted and drop) features flags on for: - netronome (nfp) - cavium (thunder) - hyperv. Turn 'redirect_target' feature flag on for: - amanzon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2) - intel (i40e, ice, igb, ixgbe) - ti (cpsw) - marvell (mvneta, mvpp2) - sfc - socionext (netsec) - qlogic (qede) - mellanox (mlx5) - tap - veth - virtio_net - xen Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Marek Majtyka <alardam@gmail.com> Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/core/gro.c 7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO") b1a78b9b9886 ("net: add support for ipv4 big tcp") https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01ice: Remove next_{dd,rs} fields from ice_tx_ringMaciej Fijalkowski
Now that both ZC and standard XDP data paths stopped using Tx logic based on next_dd and next_rs fields, we can safely remove these fields and shrink Tx ring structure. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-13-maciej.fijalkowski@intel.com
2023-02-01ice: Add support for XDP multi-buffer on Rx sideMaciej Fijalkowski
Ice driver needs to be a bit reworked on Rx data path in order to support multi-buffer XDP. For skb path, it currently works in a way that Rx ring carries pointer to skb so if driver didn't manage to combine fragmented frame at current NAPI instance, it can restore the state on next instance and keep looking for last fragment (so descriptor with EOP bit set). What needs to be achieved is that xdp_buff needs to be combined in such way (linear + frags part) in the first place. Then skb will be ready to go in case of XDP_PASS or BPF program being not present on interface. If BPF program is there, it would work on multi-buffer XDP. At this point xdp_buff resides directly on Rx ring, so given the fact that skb will be built straight from xdp_buff, there will be no further need to carry skb on Rx ring. Besides removing skb pointer from Rx ring, lots of members have been moved around within ice_rx_ring. First and foremost reason was to place rx_buf with xdp_buff on the same cacheline. This means that once we touch rx_buf (which is a preceding step before touching xdp_buff), xdp_buff will already be hot in cache. Second thing was that xdp_rxq is used rather rarely and it occupies a separate cacheline, so maybe it is better to have it at the end of ice_rx_ring. Other change that affects ice_rx_ring is the introduction of ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate rx_buf->act to all the parts of current xdp_buff after running XDP program, so that ice_put_rx_buf() that got moved out of the main Rx processing loop will be able to tak an appriopriate action on each buffer. Second is for ice_construct_skb(). ice_construct_skb() has a copybreak mechanism which had an explicit impact on xdp_buff->skb conversion in the new approach when legacy Rx flag is toggled. It works in a way that linear part is 256 bytes long, if frame is bigger than that, remaining bytes are going as a frag to skb_shared_info. This means while memcpying frags from xdp_buff to newly allocated skb, care needs to be taken when picking the destination frag array entry. Upon the time ice_construct_skb() is called, when dealing with fragmented frame, current rx_buf points to the *last* fragment, but copybreak needs to be done against the first one. That's where ice_rx_ring::first_desc helps. When frame building spans across NAPI polls (DD bit is not set on current descriptor and xdp->data is not NULL) with current Rx buffer handling state there might be some problems. Since calls to ice_put_rx_buf() were pulled out of the main Rx processing loop and were scoped from cached_ntc to current ntc, remember that now mentioned function relies on rx_buf->act, which is set within ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so currently we could put Rx buffer with rx_buf->act being *uninitialized*. To address this, change scoping to rely on first_desc on both boundaries instead. This also implies that cleaned_count which is used as an input to ice_alloc_rx_buffers() and tells how many new buffers should be refilled has to be adjusted. If it stayed as is, what could happen is a case where ntc would go over ntu. Therefore, remove cleaned_count altogether and use against allocing routine newly introduced ICE_RX_DESC_UNUSED() macro which is an equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on struct ice_rx_ring::first_desc instead of next_to_clean. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-11-maciej.fijalkowski@intel.com
2023-02-01ice: Use ice_max_xdp_frame_size() in ice_xdp_setup_prog()Maciej Fijalkowski
This should have been used in there from day 1, let us address that before introducing XDP multi-buffer support for Rx side. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-8-maciej.fijalkowski@intel.com
2023-02-01ice: Prepare legacy-rx for upcoming XDP multi-buffer supportMaciej Fijalkowski
Rx path is going to be modified in a way that fragmented frame will be gathered within xdp_buff in the first place. This approach implies that underlying buffer has to provide tailroom for skb_shared_info. This is currently the case when ring uses build_skb but not when legacy-rx knob is turned on. This case configures 2k Rx buffers and has no way to provide either headroom or tailroom - FWIW it currently has XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx buffers were used so driver in this setting was able to support 9k MTU as it can chain up to 5 Rx buffers. With offset configuring HW writing 2k of a data was passing the half of the page which broke the assumption of our internal page recycling tricks. Now if above got fixed and legacy-rx path would be left as is, when referring to skb_shared_info via xdp_get_shared_info_from_buff(), packet's content would be corrupted again. Hence size of Rx buffer needs to be lowered and therefore supported MTU. This operation will allow us to keep the unified data path and with 8k MTU users (if any of legacy-rx) would still be good to go. However, tendency is to drop the support for this code path at some point. Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320) as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx buffers, not 2k. Since headroom support is removed, disable data_meta support on legacy-rx. When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when deciding whether to support data_meta or not. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-2-maciej.fijalkowski@intel.com
2023-01-30ice: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27ice: Prevent set_channel from changing queues while RDMA activeDave Ertman
The PF controls the set of queues that the RDMA auxiliary_driver requests resources from. The set_channel command will alter that pool and trigger a reconfiguration of the VSI, which breaks RDMA functionality. Prevent set_channel from executing when RDMA driver bound to auxiliary device. Adding a locked variable to pass down the call chain to avoid double locking the device_lock. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-24ice: move devlink port creation/deletionPaul M Stillwell Jr
Commit a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") moved the code to create and destroy the devlink PF port. This was fine, but created a corner case issue in the case of ice_register_netdev() failing. In that case, the driver would end up calling ice_devlink_destroy_pf_port() twice. Additionally, it makes no sense to tie creation of the devlink PF port to the creation of the netdev so separate out the code to create/destroy the devlink PF port from the netdev code. This makes it a cleaner interface. Fixes: a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19ice: Introduce local var for readabilityTony Nguyen
Based on previous feedback[1], introduce a local var to make things more readable. [1] https://lore.kernel.org/netdev/20220315203218.607f612b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
2023-01-19ice: add missing checks for PF vsi typeJesse Brandeburg
There were a few places we had missed checking the VSI type to make sure it was definitely a PF VSI, before calling setup functions intended only for the PF VSI. This doesn't fix any explicit bugs but cleans up the code in a few places and removes one explicit != vsi->type check that can be superseded by this code (it's a super set) Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: remove redundant non-null check in ice_setup_pf_sw()Anirudh Venkataramanan
Remove a redundant null check, as vsi could not be null at this point. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: always call ice_ptp_link_change and make it voidJacob Keller
The ice_ptp_link_change function is currently only called for E822 based hardware. Future changes are going to extend this function to perform additional tasks on link change. Always call this function, moving the E810 check from the callers down to just before we call the E822-specific function required to restart the PHY. This function also returns an error value, but none of the callers actually check it. In general, the errors it produces are more likely systemic problems such as invalid or corrupt port numbers. No caller checks these, and so no warning is logged. Re-order the flag checks so that ICE_FLAG_PTP is checked first. Drop the unnecessary check for ICE_FLAG_PTP_SUPPORTED, as ICE_FLAG_PTP will not be set except when ICE_FLAG_PTP_SUPPORTED is set. Convert the port checks to WARN_ON_ONCE, in order to generate a kernel stack trace when they are hit. Convert the function to void since no caller actually checks these return values. Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23ice: Accumulate ring statistics over resetBenjamin Mikailenko
Resets may occur with or without user interaction. For example, a TX hang or reconfiguration of parameters will result in a reset. During reset, the VSI is freed, freeing any statistics structures inside as well. This would create an issue for the user where a reset happens in the background, statistics set to zero, and the user checks ring statistics expecting them to be populated. To ensure this doesn't happen, accumulate ring statistics over reset. Define a new ring statistics structure, ice_ring_stats. The new structure lives in the VSI's parent, preserving ring statistics when VSI is freed. 1. Define a new structure vsi_ring_stats in the PF scope 2. Allocate/free stats only during probe, unload, or change in ring size 3. Replace previous ring statistics functionality with new structure Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-11-23ice: Accumulate HW and Netdev statistics over resetBenjamin Mikailenko
Resets happen with or without user interaction. For example, incidents such as TX hang or a reconfiguration of parameters will result in a reset. During reset, hardware and software statistics were set to zero. This created an issue for the user where a reset happens in the background, statistics set to zero, and the user checks statistics expecting them to be populated. To ensure this doesn't happen, keep accumulating stats over reset. 1. Remove function calls which reset hardware and netdev statistics. 2. Do not rollover statistics in ice_stat_update40 during reset. Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-11-21ice: fix handling of burst Tx timestampsJacob Keller
Commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") refactored PTP timestamping logic to use a threaded IRQ instead of a separate kthread. This implementation introduced ice_misc_intr_thread_fn and redefined the ice_ptp_process_ts function interface to return a value of whether or not the timestamp processing was complete. ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts and convert it into either IRQ_HANDLED if there were no more timestamps to be processed, or IRQ_WAKE_THREAD if the thread should continue processing. This is not correct, as the kernel does not re-schedule threaded IRQ functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ function. This results in the ice_ptp_process_ts function (and in turn the ice_ptp_tx_tstamp function) from only being called exactly once per interrupt. If an application sends a burst of Tx timestamps without waiting for a response, the interrupt will trigger for the first timestamp. However, later timestamps may not have arrived yet. This can result in dropped or discarded timestamps. Worse, on E822 hardware this results in the interrupt logic getting stuck such that no future interrupts will be triggered. The result is complete loss of Tx timestamp functionality. Fix this by modifying the ice_misc_intr_thread_fn to perform its own polling of the ice_ptp_process_ts function. We sleep for a few microseconds between attempts to avoid wasting significant CPU time. The value was chosen to allow time for the Tx timestamps to complete without wasting so much time that we overrun application wait budgets in the worst case. The ice_ptp_process_ts function also currently returns false in the event that the Tx tracker is not initialized. This would result in the threaded IRQ handler never exiting if it gets started while the tracker is not initialized. Fix the function to appropriately return true when the tracker is not initialized. Note that this will not reproduce with default ptp4l behavior, as the program always synchronously waits for a timestamp response before sending another timestamp request. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Fixes: 1229b33973c7 ("ice: Add low latency Tx timestamp read") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17ice: Prevent ADQ, DCB coexistence with Custom Tx schedulerMichal Wilczynski
ADQ, DCB might interfere with Custom Tx Scheduler changes that user might introduce using devlink-rate API. Check if ADQ, DCB is active, when user tries to change any setting in exported Tx scheduler tree. If any of those are active block the user from doing so, and log an appropriate message. Remove the exported hierarchy if user enable ADQ or DCB. Prevent ADQ or DCB from getting configured if user already made some changes using devlink-rate API. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03net: remove unused ndo_get_devlink_portJiri Pirko
Remove ndo_get_devlink_port which is no longer used alongside with the implementations in drivers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_portJiri Pirko
Benefit from the previously implemented tracking of netdev events in devlink code and instead of calling devlink_port_type_eth_set() and devlink_port_type_clear() to set devlink port type and link to related netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port pointer to netdevice which is about to be registered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).Thomas Gleixner
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25ice: Enable RX queue selection using skbedit actionAmritha Nambiar
This patch uses TC skbedit queue_mapping action to support forwarding packets to a device queue. Such filters with action forward to queue will be the highest priority switch filter in HW. Example: $ tc filter add dev ens4f0 protocol ip ingress flower\ dst_ip 192.168.1.12 ip_proto tcp dst_port 5001\ action skbedit queue_mapping 5 skip_sw The above command adds an ingress filter, incoming packets qualifying the match will be accepted into queue 5. The queue number is in decimal format. Refactored ice_add_tc_flower_adv_fltr() to consolidate code with action FWD_TO_VSI and FWD_TO QUEUE. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-28net: drop the weight argument from netif_napi_addJakub Kicinski
We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27ice: reorder PF/representor devlink port register/unregister flowsJiri Pirko
Make sure that netdevice is registered/unregistered while devlink port is registered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/freescale/fec.h 7b15515fc1ca ("Revert "fec: Restart PPS after link state change"") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/ drivers/pinctrl/pinctrl-ocelot.c c297561bc98a ("pinctrl: ocelot: Fix interrupt controller") 181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/ tools/testing/selftests/drivers/net/bonding/Makefile bbb774d921e2 ("net: Add tests for bonding and team address list management") 152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/ drivers/net/can/usb/gs_usb.c 5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20ice: Add low latency Tx timestamp readKarol Kolacinski
E810 products can support low latency Tx timestamp register read. This requires usage of threaded IRQ instead of kthread to reduce the kthread start latency (spikes up to 20 ms). Add a check for the device capability and use the new method if supported. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220916201728.241510-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20ice: Fix interface being down after reset with link-down-on-close flag onMateusz Palczewski
When performing a reset on ice driver with link-down-on-close flag on interface would always stay down. Fix this by moving a check of this flag to ice_stop() that is called only when user wants to bring interface down. Fixes: ab4ab73fc1ec ("ice: Add ethtool private flag to make forcing link down optional") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Petr Oros <poros@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-20ice: config netdev tc before setting queues numberMichal Swiatkowski
After lowering number of tx queues the warning appears: "Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" Example command to reproduce: ethtool -L enp24s0f0 tx 36 rx 36 Fix this by setting correct tc mapping before setting real number of queues on netdev. Fixes: 0754d65bd4be5 ("ice: Add infrastructure for mqprio support via ndo_setup_tc") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08ice: Don't double unplug aux on peer initiated resetDave Ertman
In the IDC callback that is accessed when the aux drivers request a reset, the function to unplug the aux devices is called. This function is also called in the ice_prepare_for_reset function. This double call is causing a "scheduling while atomic" BUG. [ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003 [ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003 [ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003 [ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424 [ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe r ttm [ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [ 662.815557] Preemption disabled at: [ 662.815558] [<0000000000000000>] 0x0 [ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2 [ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815568] Call Trace: [ 662.815572] <IRQ> [ 662.815574] dump_stack_lvl+0x33/0x42 [ 662.815581] __schedule_bug.cold.147+0x7d/0x8a [ 662.815588] __schedule+0x798/0x990 [ 662.815595] schedule+0x44/0xc0 [ 662.815597] schedule_preempt_disabled+0x14/0x20 [ 662.815600] __mutex_lock.isra.11+0x46c/0x490 [ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815633] device_del+0x37/0x3d0 [ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.815756] __handle_irq_event_percpu+0x4c/0x180 [ 662.815762] handle_irq_event_percpu+0xf/0x40 [ 662.815764] handle_irq_event+0x34/0x60 [ 662.815766] handle_edge_irq+0x9a/0x1c0 [ 662.815770] __common_interrupt+0x62/0x100 [ 662.815774] common_interrupt+0xb4/0xd0 [ 662.815779] </IRQ> [ 662.815780] <TASK> [ 662.815780] asm_common_interrupt+0x1e/0x40 [ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.815801] cpuidle_enter+0x29/0x40 [ 662.815803] do_idle+0x261/0x2b0 [ 662.815807] cpu_startup_entry+0x19/0x20 [ 662.815809] start_secondary+0x114/0x150 [ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb [ 662.815818] </TASK> [ 662.815846] bad: scheduling from the idle thread! [ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2 [ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815853] Call Trace: [ 662.815855] <IRQ> [ 662.815856] dump_stack_lvl+0x33/0x42 [ 662.815860] dequeue_task_idle+0x20/0x30 [ 662.815863] __schedule+0x1c3/0x990 [ 662.815868] schedule+0x44/0xc0 [ 662.815871] schedule_preempt_disabled+0x14/0x20 [ 662.815873] __mutex_lock.isra.11+0x3a8/0x490 [ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815904] device_del+0x37/0x3d0 [ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.816033] __handle_irq_event_percpu+0x4c/0x180 [ 662.816037] handle_irq_event_percpu+0xf/0x40 [ 662.816039] handle_irq_event+0x34/0x60 [ 662.816042] handle_edge_irq+0x9a/0x1c0 [ 662.816045] __common_interrupt+0x62/0x100 [ 662.816048] common_interrupt+0xb4/0xd0 [ 662.816052] </IRQ> [ 662.816053] <TASK> [ 662.816054] asm_common_interrupt+0x1e/0x40 [ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.816075] cpuidle_enter+0x29/0x40 [ 662.816077] do_idle+0x261/0x2b0 [ 662.816080] cpu_startup_entry+0x19/0x20 [ 662.816083] start_secondary+0x114/0x150 [ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb [ 662.816091] </TASK> [ 662.816169] bad: scheduling from the idle thread! The correct place to unplug the aux devices for a reset is in the prepare_for_reset function, as this is a common place for all reset flows. It also has built in protection from being called twice in a single reset instance before the aux devices are replugged. Fixes: f9f5301e7e2d4 ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
drivers/net/ethernet/freescale/fec.h 7d650df99d52 ("net: fec: add pm_qos support on imx6q platform") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-06ice: Allow operation with reduced device MSI-XTony Nguyen
The driver currently takes an all or nothing approach for device MSI-X vectors. Meaning if it does not get its full allocation, it will fail and not load. There is no reason it can't work with a reduced number of MSI-X vectors. Take a similar approach as commit 741106f7bd8d ("ice: Improve MSI-X fallback logic") and, instead, adjust the MSI-X request to make use of what is available. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Petr Oros <poros@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
2022-09-02ice: use bitmap_free instead of devm_kfreeMichal Swiatkowski
pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be used to free this memory. Fixes: 78b5713ac1241 ("ice: Alloc queue management bitmaps and arrays dynamically") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-02ice: Fix DMA mappings leakPrzemyslaw Patynowski
Fix leak, when user changes ring parameters. During reallocation of RX buffers, new DMA mappings are created for those buffers. New buffers with different RX ring count should substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused leak of already mapped DMA. Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate back to rx_buf, when BPF program unloads. If BPF program is loaded/unloaded and XSK pools are created, reallocate RX queues accordingly in XDP_SETUP_XSK_POOL handler. Steps for reproduction: while : do for ((i=0; i<=8160; i=i+32)) do ethtool -G enp130s0f0 rx $i tx $i sleep 0.5 ethtool -g enp130s0f0 done done Fixes: 617f3e1b588c ("ice: xsk: allocate separate memory for XDP SW ring") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-18 (ice) This series contains updates to ice driver only. Jesse and Anatolii add support for controlling FCS/CRC stripping via ethtool. Anirudh allows for 100M speeds on devices which support it. Sylwester removes ucast_shared field and the associated dead code related to it. Mikael removes non-inclusive language from the driver. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: remove non-inclusive language ice: Remove ucast_shared ice: Allow 100M speeds for some devices ice: Implement FCS/CRC and VLAN stripping co-existence policy ice: Implement control of FCS/CRC stripping ==================== Link: https://lore.kernel.org/r/20220818155207.996297-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22ice: xsk: use Rx ring's XDP ring when picking NAPI contextMaciej Fijalkowski
Ice driver allocates per cpu XDP queues so that redirect path can safely use smp_processor_id() as an index to the array. At the same time though, XDP rings are used to pick NAPI context to call napi_schedule() or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and num_possible_cpus() of underlying platform is 44, then this means queue vectors with correlated NAPI contexts will carry several XDP queues. This in turn can result in a broken behavior where NAPI context of interest will never be scheduled and AF_XDP socket will not process any traffic. To fix this, let us change the way how XDP rings are assigned to Rx rings and use this information later on when setting ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated queue vector and walk through Tx ring's linked list. Once we stumble upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring. Previous [0] approach of fixing this issue was for txonly scenario because of the described grouping of XDP rings across queue vectors. So, relying on Rx ring meant that NAPI context could be scheduled with a queue vector without XDP ring with associated XSK pool. [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/ Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18ice: Remove ucast_sharedSylwester Dziedziuch
Remove ucast_shared as it was always true. Remove the code depending on ucast_shared from ice_add_mac and ice_remove_mac. Remove ice_find_ucast_rule_entry function as it was only used when ucast_shared was set to false. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18ice: Implement FCS/CRC and VLAN stripping co-existence policyAnatolii Gerasymenko
Make sure that only the valid combinations of FCS/CRC stripping and VLAN stripping offloads are allowed. You cannot have FCS/CRC stripping disabled while VLAN stripping is enabled - this breaks the correctness of the FCS/CRC. If administrator tries to enable VLAN stripping when FCS/CRC stripping is disabled, the request should be rejected. If administrator tries to disable FCS/CRC stripping when VLAN stripping is enabled, the request should be rejected if VLANs are configured. If there is no VLAN configured, then both FCS/CRC and VLAN stripping should be disabled. Testing Hints: The default settings after driver load are: - VLAN C-Tag offloads are enabled - VLAN S-Tag offloads are disabled - FCS/CRC stripping is enabled Restore the default settings before each test with the command: ethtool -K eth0 rx-fcs off rxvlan on txvlan on rx-vlan-stag-hw-parse off tx-vlan-stag-hw-insert off Test 1: Disable FCS/CRC and VLAN stripping: ethtool -K eth0 rx-fcs on rxvlan off Try to enable VLAN stripping: ethtool -K eth0 rxvlan on Expected: VLAN stripping request is rejected Test 2: Try to disable FCS/CRC stripping: ethtool -K eth0 rx-fcs on Expected: VLAN stripping is also disabled, as there are no VLAN configured Test 3: Add a VLAN: ip link add link eth0 eth0.42 type vlan id 42 ip link set eth0 up Try to disable FCS/CRC stripping: ethtool -K eth0 rx-fcs on Expected: FCS/CRC stripping request is rejected Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18ice: Implement control of FCS/CRC strippingJesse Brandeburg
The driver can allow the user to configure whether the CRC aka the FCS (Frame Check Sequence) is DMA'd to the host as part of the receive buffer. The driver usually wants this feature disabled so that the hardware checks the FCS and strips it in order to save PCI bandwidth. Control the reception of FCS to the host using the command: ethtool -K eth0 rx-fcs <on|off> The default shown in ethtool -k eth0 | grep fcs; should be "off", as the hardware will drop any frame with a bad checksum, and DMA of the checksum is useless overhead especially for small packets. Testing Hints: test the FCS/CRC arrives with received packets using tcpdump -nnpi eth0 -xxxx and it should show crc data as the last 4 bytes of the packet. Can also use wireshark to turn on CRC checking and check the data is correct. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-17ice: Fix clearing of promisc mode with bridge over bondGrzegorz Siwik
When at least two interfaces are bonded and a bridge is enabled on the bond, an error can occur when the bridge is removed and re-added. The reason for the error is because promiscuous mode was not fully cleared from the VLAN VSI in the hardware. With this change, promiscuous mode is properly removed when the bridge disconnects from bonding. [ 1033.676359] bond1: link status definitely down for interface enp95s0f0, disabling it [ 1033.676366] bond1: making interface enp175s0f0 the new active one [ 1033.676369] device enp95s0f0 left promiscuous mode [ 1033.676522] device enp175s0f0 entered promiscuous mode [ 1033.676901] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6 [ 1041.795662] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6 [ 1041.944826] bond1: link status definitely down for interface enp175s0f0, disabling it [ 1041.944874] device enp175s0f0 left promiscuous mode [ 1041.944918] bond1: now running without any active interface! Fixes: c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations") Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Conflicts: net/ax25/af_ax25.c d7c4c9e075f8c ("ax25: fix incorrect dev_tracker usage") d62607c3fe459 ("net: rename reference+tracking helpers") drivers/net/netdevsim/fib.c 180a6a3ee60a ("netdevsim: fib: Fix reference count leak on route deletion failure") 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-01net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()Jian Shen
vsi->current_netdev_flags is used store the current net device flags, not the active netdevice features. So it should use vsi->netdev->featurs, rather than vsi->current_netdev_flags to check NETIF_F_HW_VLAN_CTAG_FILTER. Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-07-28 This series contains updates to ice driver only. Michal allows for VF true promiscuous mode to be set for multiple VFs and adds clearing of promiscuous filters when VF trust is removed. Maciej refactors ice_set_features() to track/check changed features instead of constantly checking against netdev features and adds support for NETIF_F_LOOPBACK. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: allow toggling loopback mode via ndo_set_features callback ice: compress branches in ice_set_features() ice: Fix promiscuous mode not turning off ice: Introduce enabling promiscuous mode on multiple VF's ==================== Link: https://lore.kernel.org/r/20220728195538.3391360-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28ice: allow toggling loopback mode via ndo_set_features callbackMaciej Fijalkowski
Add support for NETIF_F_LOOPBACK. This feature can be set via: $ ethtool -K eth0 loopback <on|off> Feature can be useful for local data path tests. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-28ice: compress branches in ice_set_features()Maciej Fijalkowski
Instead of rather verbose comparison of current netdev->features bits vs the incoming ones from user, let us compress them by a helper features set that will be the result of netdev->features XOR features. This way, current, extensive branches: if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT)) set_feature(true); else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT) set_feature(false); can become: netdev_features_t changed = netdev->features ^ features; if (changed & NETIF_F_BIT) set_feature(!!(features & NETIF_F_BIT)); This is nothing new as currently several other drivers use this approach, which I find much more convenient. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-28ice: Introduce enabling promiscuous mode on multiple VF'sMichal Wilczynski
In current implementation default VSI switch filter is only able to forward traffic to a single VSI. This limits promiscuous mode with private flag 'vf-true-promisc-support' to a single VF. Enabling it on the second VF won't work. Also allmulticast support doesn't seem to be properly implemented when vf-true-promisc-support is true. Use standard ice_add_rule_internal() function that already implements forwarding to multiple VSI's instead of constructing AQ call manually. Add switch filter for allmulticast mode when vf-true-promisc-support is enabled. The same filter is added regardless of the flag - it doesn't matter for this case. Remove unnecessary fields in switch structure. From now on book keeping will be done by ice_add_rule_internal(). Refactor unnecessarily passed function arguments. To test: 1) Create 2 VM's, and two VF's. Attach VF's to VM's. 2) Enable promiscuous mode on both of them and check if traffic is seen on both of them. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: do not setup vlan for loopback VSIMaciej Fijalkowski
Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>