linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-11-01	netdevsim: take rtnl_lock when assigning num_vfs	Jakub Kicinski
	Legacy VF NDOs look at num_vfs and then based on that index into vfconfig. If we don't rtnl_lock() num_vfs may get set to 0 and vfconfig freed/replaced while the NDO is running. We don't need to protect replacing vfconfig since it's only done when num_vfs is 0. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	net: mana: Support hibernation and kexec	Dexuan Cui
	Implement the suspend/resume/shutdown callbacks for hibernation/kexec. Add mana_gd_setup() and mana_gd_cleanup() for some common code, and use them in the mand_gd_* callbacks. Reuse mana_probe/remove() for the hibernation path. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	net: mana: Improve the HWC error handling	Dexuan Cui
	Currently when the HWC creation fails, the error handling is flawed, e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails, the resources acquired in mana_hwc_init_queues() is not released. Enhance mana_hwc_destroy_channel() to do the proper cleanup work and call it accordingly. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	net: mana: Report OS info to the PF driver	Dexuan Cui
	The PF driver might use the OS info for statistical purposes. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	net: mana: Fix the netdev_err()'s vPort argument in mana_init_port()	Dexuan Cui
	Use the correct port index rather than 0. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	ibmvnic: delay complete()	Sukadev Bhattiprolu
	If we get CRQ_INIT, we set errno to -EIO and first call complete() to notify the waiter. Then we try to schedule a FAILOVER reset. If this occurs while adapter is in PROBING state, ibmvnic_reset() changes the error code to EAGAIN and returns without scheduling the FAILOVER. The purpose of setting error code to EAGAIN is to ask the waiter to retry. But due to the earlier complete() call, the waiter may already have seen the -EIO response and decided not to retry. This can cause intermittent failures when bringing up ibmvnic adapters during boot, specially in in kexec/kdump kernels. Defer the complete() call until after scheduling the reset. Also streamline the error code to EAGAIN. Don't see why we need EIO sometimes. All 3 callers of ibmvnic_reset_init() can handle EAGAIN. Fixes: 17c8705838a5 ("ibmvnic: Return error code if init interrupted by transport event") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	ibmvnic: Process crqs after enabling interrupts	Sukadev Bhattiprolu
	Soon after registering a CRQ it is possible that we get a fail over or maybe a CRQ_INIT from the VIOS while interrupts were disabled. Look for any such CRQs after enabling interrupts. Otherwise we can intermittently fail to bring up ibmvnic adapters during boot, specially in kexec/kdump kernels. Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	ibmvnic: don't stop queue in xmit	Sukadev Bhattiprolu
	If adapter's resetting bit is on, discard the packet but don't stop the transmit queue - instead leave that to the reset code. With this change, it is possible that we may get several calls to ibmvnic_xmit() that simply discard packets and return. But if we stop the queue here, we might end up doing so just after __ibmvnic_open() started the queues (during a hard/soft reset) and before the ->resetting bit was cleared. If that happens, there will be no one to restart queue and transmissions will be blocked indefinitely. This can cause a TIMEOUT reset and with auto priority failover enabled, an unnecessary FAILOVER reset to less favored backing device and then a FAILOVER back to the most favored backing device. If we hit the window repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER resets leaving the adapter unusable for extended periods of time. Fixes: 7f5b030830fe ("ibmvnic: Free skb's in cases of failure in transmit") Reported-by: Abdul Haleem <abdhalee@in.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	nfp: flower: Allow ipv6gretap interface for offloading	Yu Xiao
	The tunnel_type check only allows for "netif_is_gretap", but for OVS the port is actually "netif_is_ip6gretap" when setting up GRE for ipv6, which means offloading request was rejected before. Therefore, adding "netif_is_ip6gretap" allow ipv6gretap interface for offloading. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	Merge branch '100GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-29 This series contains updates to ice and iavf drivers and virtchnl header file. Brett removes vlan_promisc argument from a function call for ice driver. In the virtchnl header file he removes an unused, reserved define and converts raw value defines to instead use the BIT macro. Marcin adds syncing of MAC addresses when creating switchdev VFs to remove error messages on link up and stops showing buffer information for port representors to remove duplicated entries being displayed for ice driver. Karen introduces a helper to go from pci_dev to iavf_adapter in the iavf driver. Przemyslaw fixes an issue where iavf was attempting to free IRQs before calling disable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	Merge tag 'mlx5-updates-2021-10-29' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-29 1) Minor trivial refactoring and improvements 2) Check for unsupported parameters fields in SW steering 3) Support TC offload for OVS internal port, from Ariel, see below. Ariel Levkovich says: ===================== Support HW offload of TC rules involving OVS internal port device type as the filter device or the destination device. The support is for flows which explicitly use the internal port as source or destination device as well as indirect offload for flows performing tunnel set or unset via a tunnel device and the internal port is the tunnel overlay device. Since flows with internal port as source port are added as egress rules while redirecting to internal port is done as an ingress redirect, the series introduces the necessary changes in mlx5_core driver to support the new types of flows and actions. ===================== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-31	Merge tag 'kvmarm-5.16' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us
2021-10-30	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small fixes, all in drivers, and one sizeable update to the UFS driver to remove the HPB 2.0 feature that has been objected to by Jens and Christoph. Although the UFS patch is large and last minute, it's essentially the least intrusive way of resolving the objections in time for the 5.15 release" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: ufshpb: Remove HPB2.0 flows scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT scsi: ufs: ufs-exynos: Correct timeout value setting registers scsi: ibmvfc: Fix up duplicate response detection
2021-10-30	Merge tag 'clk-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One fix for the composite clk that broke when we changed this clk type to use the determine_rate instead of round_rate clk op by default. This caused lots of problems on Rockchip SoCs because they heavily use the composite clk code to model the clk tree" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: composite: Also consider .determine_rate for rate + mux composites
2021-10-30	scsi: ufs: ufshpb: Remove HPB2.0 flows	Avri Altman
	The Host Performance Buffer feature allows UFS read commands to carry the physical media addresses along with the LBAs, thus allowing less internal L2P-table switches in the device. HPB1.0 allowed a single LBA, while HPB2.0 increases this capacity up to 255 blocks. Carrying more than a single record, the read operation is no longer purely of type "read" but a "hybrid" command: Writing the physical address to the device in one operation and reading back the required payload in another. The JEDEC HPB spec defines two commands for this operation: HPB-WRITE-BUFFER (0x2) to write the physical addresses to device, and HPB-READ to read the payload. With the current HPB design the UFS driver has no alternative but to divide the READ request into 2 separate commands: HPB-WRITE-BUFFER and HPB-READ. This causes a great deal of aggravation to the block layer guys who demanded that we completely revert the entire HPB driver regardless of the huge amount of corporate effort already invested in it. As a compromise, remove only the pieces that implement the 2.0 specification. This is done as a matter of urgency for the final 5.15 release. Link: https://lore.kernel.org/r/20211030062301.248-1-avri.altman@wdc.com Tested-by: Avri Altman <avri.altman@wdc.com> Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-29	Merge branch '1GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-10-29 This series contains updates to igc driver only. Sasha removes an unnecessary media type check, adds a new device ID, and changes a device reset to a port reset command. ==================== Link: https://lore.kernel.org/r/20211029174101.2970935-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29	bnxt_en: Remove not used other ULP define	Leon Romanovsky
	There is only one bnxt ULP in the upstream kernel and definition for other ULP can be safely removed. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/3a8ea720b28ec4574648012d2a00208f1144eff5.1635527693.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29	Merge branch '40GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2021-10-29 This series contains updates to i40e, ice, igb, and ixgbevf drivers. Yang Li simplifies return statements of bool values for i40e and ice. Jan Kundrát corrects problems with I2C bit-banging for igb. Colin Ian King removes unneeded variable initialization for ixgbevf. ==================== Link: https://lore.kernel.org/r/20211029164641.2714265-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29	netdevsim: remove max_vfs dentry	Jakub Kicinski
	Commit d395381909a3 ("netdevsim: Add max_vfs to bus_dev") added this file and saved the dentry for no apparent reason. Link: https://lore.kernel.org/r/20211028211753.22612-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29	mailbox: imx: support i.MX8ULP S4 MU	Peng Fan
	Like i.MX8 SCU, i.MX8ULP S4 also has vendor specific protocol. - bind SCU/S4 MU part to share one tx/rx/init API to make code simple. - S4 msg max size is very large, so alloc the space at driver probe, not use local on stack variable. - S4 MU has 8 TR and 4 RR which is different with i.MX8 MU, so adapt code to reflect this. Tested on i.MX8MP, i.MX8ULP Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Move bulk of PCCT parsing into pcc_mbox_probe	Sudeep Holla
	Move the PCCT subspace parsing and allocation into pcc_mbox_probe so that we can get rid of global PCC channel and mailbox controller data. It also helps to make use of devm_* APIs for all the allocations. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Add support for PCCT extended PCC subspaces(type 3/4)	Sudeep Holla
	With all the plumbing in place to avoid accessing PCCT type and other fields directly from the PCCT table all the time, let us now add the support for extended PCC subspaces(type 3 and 4). Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Drop handling invalid bit-width in {read,write}_register	Sudeep Holla
	pcc_chan_reg_init now checks if the register bit width is within the list [8, 16, 32, 64] and flags error if that is not the case. Therefore there is no need to handling invalid bit-width in both read_register and write_register. We can drop that along with the return values for these 2 functions. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Avoid accessing PCCT table in pcc_send_data and pcc_mbox_irq	Sudeep Holla
	Now that the con_priv is availvale solely for PCC mailbox controller driver, let us use the same to save the channel specific information in it so that we can it whenever required instead of parsing the PCCT table entries every time in both pcc_send_data and pcc_mbox_irq. We can now use the newly introduces PCC register bundle to simplify both saving of channel specific information and accessing them without repeated checks for the subspace type. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Add PCC register bundle and associated accessor functions	Sudeep Holla
	Extended PCC subspaces introduces more registers into the PCCT. In order to consolidate access to these registers and to keep all the details contained in one place, let us introduce PCC register bundle that holds the ACPI Generic Address Structure as well as the virtual address for the same if it is mapped in the OS. It also contains the various masks used to access the register and the associated read, write and read-modify-write accessors. We can also clean up the initialisations by having a helper function for the same. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Rename doorbell ack to platform interrupt ack register	Sudeep Holla
	The specification refers this register and associated bitmask as platform interrupt acknowledge register. Let us rename it so that it is easier to map and understand. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Use PCC mailbox channel pointer instead of standard	Sudeep Holla
	Now that we have all the shared memory region information populated in the pcc_mbox_chan, let us propagate the pointer to the same as the return value to pcc_mbox_request channel. This eliminates the need for the individual users of PCC mailbox to parse the PCCT subspace entries and fetch the shmem information. This also eliminates the need for PCC mailbox controller to set con_priv to PCCT subspace entries. This is required as con_priv is private to the controller driver to attach private data associated with the channel and not meant to be used by the mailbox client/users. Let us convert all the users of pcc_mbox_{request,free}_channel to use new interface. Cc: Jean Delvare <jdelvare@suse.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Wolfram Sang <wsa@kernel.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Add pcc_mbox_chan structure to hold shared memory region info	Sudeep Holla
	Currently PCC mailbox controller sets con_priv in each channel to hold the pointer to pcct subspace entry it corresponds to. The mailbox user will then fetch this pointer from the channel descriptor they get when they request for the channel. Using that pointer they then parse the pcct entry again to fetch all the information about shared memory region. In order to remove individual users of PCC mailbox parsing the PCCT subspace entries to fetch same information, let us consolidate the same in pcc mailbox controller by parsing all the shared memory region information into a structure that can also hold the mbox_chan pointer it represent. This can then be used as main PCC mailbox channel pointer that we can return as part of pcc_mbox_request_channel instead of standard mailbox channel pointer. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Consolidate subspace doorbell register parsing	Sudeep Holla
	Extended PCC subspaces(Type 3 and 4) differ from generic(Type 0) and HW-Reduced Communication(Type 1 and 2) subspace structures. However some fields share same offsets and same type of structure can be use to extract the fields. In order to simplify that, let us move all the doorbell register parsing into pcc_parse_subspace_db_reg and consolidate there. It will be easier to extend it if required within the same. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Consolidate subspace interrupt information parsing	Sudeep Holla
	Extended PCC subspaces(Type 3 and 4) differ from generic(Type 0) and HW-Reduced Communication(Type 1 and 2) subspace structures. However some fields share same offsets and same type of structure can be use to extract the fields. In order to simplify that, let us move all the IRQ related information parsing into pcc_parse_subspace_irq and consolidate there. It will be easier to extend it if required within the same. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Refactor all PCC channel information into a structure	Sudeep Holla
	Currently all the PCC channel specific information are stored/maintained in global individual arrays for each of those information. It is not scalable and not clean if we have to stash more channel specific information. Couple of reasons to stash more information are to extend the support to Type 3/4 PCCT subspace and also to avoid accessing the PCCT table entries themselves each time we need the information. This patch moves all those PCC channel specific information into a separate structure pcc_chan_info. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: pcc: Fix kernel doc warnings	Sudeep Holla
	Kernel doc validation script is unhappy and complains with the below set of warnings. \| drivers/mailbox/pcc.c:179: warning: Function parameter or member 'irq' \| not described in 'pcc_mbox_irq' \| drivers/mailbox/pcc.c:179: warning: Function parameter or member 'p' \| not described in 'pcc_mbox_irq' \| drivers/mailbox/pcc.c:378: warning: expecting prototype for \| parse_pcc_subspaces(). Prototype was for parse_pcc_subspace() instead Fix it. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	mailbox: apple: Add driver for Apple mailboxes	Sven Peter
	Apple SoCs such as the M1 come with various co-processors. Mailboxes are used to communicate with those. This driver adds support for two variants of those mailboxes. Signed-off-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29	Merge tag 'gpio-fixes-for-v5.15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix the return value check when parsing the ngpios property in gpio-xgs-iproc - check the return value of bgpio_init() in gpio-mlxbf2 * tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: mlxbf2.c: Add check for bgpio_init failure gpio: xgs-iproc: fix parsing of ngpios property
2021-10-29	net/mlx5: Support internal port as decap route device	Ariel Levkovich
	When performing route device lookup for decap action, support the case of ovs internal port as the lookup result. In such case, an internal port struct is mapped and attached to the flow attributes so that the source port matching of the rule will match on the internal port's metadata value. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Term table handling of internal port rules	Ariel Levkovich
	Adjust termination table logic to handle rules which involve internal port as filter or forwarding device. For cases where the rule forwards from internal port to uplink, always choose to go via termination table. This is because it is not known from where the packet originally arrived to the internal port and it is possible that it came from the uplink itself, in which case a term table is required to perform hairpin. If the packet arrived from a vport, going via term table has no effect. For cases where the rule forwards to an internal port from uplink the rep pointer will point to the uplink rep, avoid going via termination table as it is not required. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Add indirect tc offload of ovs internal port	Ariel Levkovich
	Register callbacks for tc blocks of ovs internal port devices. This allows an indirect offloading rules that apply on such devices as the filter device. In case a rule is added to a tc block of an internal port, the mlx5 driver will implicitly add a matching on the internal port's unique vport metadata value to the rule's matching list. Therefore, only packets that previously hit a rule that redirects to an internal port and got the vport metadata overwritten to the internal port's unique metadata, can match on such indirect rule. Offloading of both ingress and egress tc blocks of internal ports is supported as opposed to other devices where only ingress block offloading is supported. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Offload internal port as encap route device	Ariel Levkovich
	When pefroming encap action, a route lookup is performed to find the routing device the packet should be forwarded to after the encapsulation. This is the device that has the local tunnel ip address. This change adds support to offload an encap rule where the route device ends up being an ovs internal port. In such case, the driver will add a HW rule that will encapsulate the packet with the tunnel header and will overwrite the vport metadata in reg_c0 to the internal port metadata value. Finally, the packet will be forwarded to the root table to be processed again with the indication that it came from an internal port. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Offload tc rules that redirect to ovs internal port	Ariel Levkovich
	Allow offloading rules that redirect to ovs internal port ingress and egress. To support redirect to ingress device, offloading of REDIRECT_INGRESS action is added. When a tc rule redirects to ovs internal port, the hw rule will overwrite the input vport value in reg_c0 with a new vport metadata value that is mapped for this internal port using the internal port mapping api that is introduce in previous patches. After that the hw rule will redirect the packet to the root table to continue processing with the new vport metadata value. The new vport metadata value indicates that this packet is now arriving through an internal port and therefore should be processed using rules that apply on the same internal port as the filter device. Therefore, following rules that apply on this internal port will have to match on the same vport metadata value as part of their matching keys to make sure the packet belongs to the internal port. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Accept action skbedit in the tc actions list	Ariel Levkovich
	Setting the skb packet type field to host is usually done when performing forwarding to ingress device. This is required since the receive handling that is used by the redirect to ingress action checks whether the packet doesn't belong to this host and drops the packet in such case. In order to be able to offload action redirect ingress, tc offload code needs to accept the skbedit ptype action as well. There's no special handling in HW for such action since it will be followed by a redirect action and therefore, this code only allows us to accept such action in the actions list but not performing anything specific in HW for it. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5: E-Switch, Add ovs internal port mapping to metadata support	Ariel Levkovich
	Adding infrastructure to map ovs internal port device to vport match metadata to support offload of rules with internal port as the filter device or as the destination device. The infrastructure allows adding and removing internal port device to an eswitch database and getting a unique vport metadata value to be placed and match on in reg_c0 when offloading rules that are coming from or going to an internal port. The new int port metadata can be written to the source port register in HW to indicate that current source port of the packet is the internal port and not one of the actual HW vports (uplink or VF). Using this method, it is possible to offload TC rules with an OVS internal port as their destination port (overwriting the src vport register) or as the filter port (matching on the value of the src vport register and making sure it matches to the internal port's value). There is also a need to handle a miss case where the packet's src port value was changed in HW to an internal port but a following rule which matches on this new src port value wasn't found in HW. In such case, the packet will be forwarded to the driver with metadata which allows driver to restore the info of the internal port's netdevice. Once this info is restored, the uplink driver can forward the packet to the relevant netdevice in SW. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Use generic name for the forwarding dev pointer	Ariel Levkovich
	Rename tun_dev to fwd_dev within mlx5e_tc_update_priv struct since future implementation may introduce other device types which the handler is forwarding to. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: Refactor rx handler of represetor device	Ariel Levkovich
	Move the ownership of skb forwarding to network stack to the tc update_skb handler as different cases will require different handling of the skb. While the tc handler will take care of the various cases and properly handle the handover of the skb to the network stack and freeing the skb, the main rx handler will be kept clean from branches and usage of flags. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5: DR, Add check for unsupported fields in match param	Muhammad Sammar
	When a matcher is being built, we "consume" (clear) mask fields one by one, and to verify that we do support all the required fields we check if the whole mask was consumed, else the matching request includes unsupported fields. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-10-29	net/mlx5: Allow skipping counter refresh on creation	Paul Blakey
	CT creates a counter for each CT rule, and for each such counter, fs_counters tries to queue mlx5_fc_stats_work() work again via mod_delayed_work(0) call to refresh all counters. This call has a large performance impact when reaching high insertion rate and accounts for ~8% of the insertion time when using software steering. Allow skipping the refresh of all counters during counter creation. Change CT to use this refresh skipping for it's counters. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5e: IPsec: Refactor checksum code in tx data path	Raed Salem
	Part of code that is related solely to IPsec is always compiled in the driver code regardless if the IPsec functionality is enabled or disabled in the driver code, this will add unnecessary branch in case IPsec is disabled at Tx data path. Move IPsec related code to IPsec related file such that in case of IPsec is disabled and because of unlikely macro the compiler should be able to optimize and omit the checksum IPsec code all together from Tx data path Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5: CT: Remove warning of ignore_flow_level support for VFs	Paul Blakey
	ignore_flow_level isn't supported for VFs, and so it causes post_act and ct to warn about it. Instead of disabling CT for VFs, and a driver update will be need to enable CT again once firmware support this, remove this warning specifically for VFs. This way, it could be automatically enabled on future firmwares where VFs support ignore_flow_level capability. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	net/mlx5: Add esw assignment back in mlx5e_tc_sample_unoffload()	Nathan Chancellor
	Clang warns: drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c:635:34: error: variable 'esw' is uninitialized when used here [-Werror,-Wuninitialized] mlx5_eswitch_del_offloaded_rule(esw, sample_flow->pre_rule, sample_flow->pre_attr); ^~~ drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c:626:26: note: initialize the variable 'esw' to silence this warning struct mlx5_eswitch *esw; ^ = NULL 1 error generated. It appears that the assignment should have been shuffled instead of removed outright like in mlx5e_tc_sample_offload(). Add it back so there is no use of esw uninitialized. Fixes: a64c5edbd20e ("net/mlx5: Remove unnecessary checks for slow path flag") Link: https://github.com/ClangBuiltLinux/linux/issues/1494 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29	iavf: Fix kernel BUG in free_msi_irqs	Przemyslaw Patynowski
	Fix driver not freeing VF's traffic irqs, prior to calling pci_disable_msix in iavf_remove. There were possible 2 erroneous states in which, iavf_close would not be called. One erroneous state is fixed by allowing netdev to register, when state is already running. It was possible for VF adapter to enter state loop from running to resetting, where iavf_open would subsequently fail. If user would then unload driver/remove VF pci, iavf_close would not be called, as the netdev was not registered, leaving traffic pcis still allocated. Fixed this by breaking loop, allowing netdev to open device when adapter state is __IAVF_RUNNING and it is not explicitily downed. Other possiblity is entering to iavf_remove from __IAVF_RESETTING state, where iavf_close would not free irqs, but just return 0. Fixed this by checking for last adapter state and then removing irqs. Kernel panic: [ 2773.628585] kernel BUG at drivers/pci/msi.c:375! ... [ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0 ... [ 2773.640939] Call Trace: [ 2773.641572] pci_disable_msix+0xf7/0x120 [ 2773.642224] iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf] [ 2773.642897] iavf_remove+0x12e/0x500 [iavf] [ 2773.643578] pci_device_remove+0x3b/0xc0 [ 2773.644266] device_release_driver_internal+0x103/0x1f0 [ 2773.644948] pci_stop_bus_device+0x69/0x90 [ 2773.645576] pci_stop_and_remove_bus_device+0xe/0x20 [ 2773.646215] pci_iov_remove_virtfn+0xba/0x120 [ 2773.646862] sriov_disable+0x2f/0xe0 [ 2773.647531] ice_free_vfs+0x2f8/0x350 [ice] [ 2773.648207] ice_sriov_configure+0x94/0x960 [ice] [ 2773.648883] ? _kstrtoull+0x3b/0x90 [ 2773.649560] sriov_numvfs_store+0x10a/0x190 [ 2773.650249] kernfs_fop_write+0x116/0x190 [ 2773.650948] vfs_write+0xa5/0x1a0 [ 2773.651651] ksys_write+0x4f/0xb0 [ 2773.652358] do_syscall_64+0x5b/0x1a0 [ 2773.653075] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 22ead37f8af8 ("i40evf: Add longer wait after remove module") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29	iavf: Add helper function to go from pci_dev to adapter	Karen Sornek
	Add helper function to go from pci_dev to adapter to make work simple - to go from a pci_dev to the adapter structure and make netdev assignment instead of having to go to the net_device then the adapter. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>