summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw
AgeCommit message (Collapse)Author
2021-12-20mlxsw: reg: Extend MFDE register with new events and parametersDanielle Ratson
Extend the Monitoring Firmware Debug (MFDE) register with new events and their related parameters. These events will be utilized by devlink-health in the next patch. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-20mlxsw: core: Convert a series of if statements to switch caseDanielle Ratson
Convert a series of if statements that handle different events to a switch case statement. Encapsulate the per-event code in different functions to simplify the code. This is a preparation for subsequent patches that will add more events that need to be handled. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-20mlxsw: Fix naming convention of MFDE fieldsDanielle Ratson
Currently, the MFDE register field names are using the convention: reg_mfde_<NAME_OF_FIELD>, and do not consider the name of the MFDE event. Fix the field names so they fit the more accurate convention: reg_mfde_<NAME_OF_EVENT>_<NAME_OF_FIELD>. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add index to flow_action_entry structureBaowen Zheng
Add index to flow_action_entry structure and delete index from police and gate child structure. We make this change to offload tc action for driver to identify a tc action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15mlxsw: Add support for VxLAN with IPv6 underlayAmit Cohen
Currently, mlxsw driver supports VxLAN with IPv4 underlay only. Add support for IPv6 underlay. The main differences are: * Learning is not supported for IPv6 FDB entries, use static entries and do not allow 'learning' flag for IPv6 VxLAN. * IPv6 addresses for FDB entries should be saved as part of KVDL. Use the new API to allocate and release entries for IPv6 addresses. * Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP packets with checksum zero are dropped. Force the relevant flags which allow the VxLAN device to generate UDP packets with zero checksum and also receive them. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum_nve: Keep track of IPv6 addresses used by FDB entriesAmit Cohen
FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold a reference on a resource. Namely, the KVDL entry where the IPv6 underlay destination IP is stored. When such an FDB entry is deleted, it needs to drop the reference from the corresponding KVDL entry. To that end, maintain a hash table that maps an FDB entry (i.e., {MAC, FID}) to the IPv6 address used by it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: reg: Add a function to fill IPv6 unicast FDB entriesAmit Cohen
Add a function to fill IPv6 unicast FDB entries. Use the common function for common fields. Unlike IPv4 entries, the underlay IP address is not filled in the register payload, but instead a pointer to KVDL is used. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: Split handling of FDB tunnel entries between address familiesAmit Cohen
Currently, the function which adds/removes unicast tunnel FDB entries is shared between IPv4 and IPv6, while for IPv6 it warns because there is no support for it. The code for IPv6 will be more complicated because it needs to allocate/release a KVDL pointer for the underlay IPv6 address. As a preparation for IPv6 underlay support, split the code according to address family. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum_nve_vxlan: Make VxLAN flags check per address familyAmit Cohen
As part of 'can_offload' checks, there is a check of VxLAN flags. The supported flags for IPv6 VxLAN will be different from the existing flags because of some limitations. As preparation for IPv6 underlay support, make this check per address family. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum_ipip: Use common hash table for IPv6 address mappingAmit Cohen
Use the common hash table introduced by the previous patch instead of the IP-in-IP specific implementation. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum: Add hash table for IPv6 address mappingAmit Cohen
The device supports forwarding entries such as routes and FDBs that perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation. When the underlay is IPv6, these entries do not encode the 128 bit IPv6 address used for encapsulation / decapsulation. Instead, these entries encode a 24 bit pointer to an array called KVDL where the IPv6 address is stored. Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent patches will add support for VxLAN with IPv6 underlay. To avoid duplicating the logic required to store and retrieve these IPv6 addresses, introduce a hash table that will store the mapping between IPv6 addresses and their KVDL index. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14mlxsw: spectrum_router: Consolidate MAC profiles when possibleDanielle Ratson
Currently, when setting a router interface (RIF) MAC address while the MAC profile is not shared with other RIFs, the profile is edited so that the new MAC address is assigned to it. This does not take into account a situation in which the new MAC address already matches an existing MAC profile. In that situation, two MAC profiles will be occupied even though they hold MAC addresses from the same profile. In order to prevent that, add a check to ensure that editing a MAC profile takes place only when the new MAC address does not match an existing profile. Fixes: 605d25cd782a6 ("mlxsw: spectrum_router: Add RIF MAC profiles support") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Tested-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: Use Switch Multicast ID Register Version 2Amit Cohen
The SMID-V2 register maps Multicast ID (MID) into a list of local ports. It is a new version of SMID in order to support 1024 bits of local_port. Add SMID-V2 register and use it instead of SMID. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: Use Switch Flooding Table Register Version 2Amit Cohen
The SFTR-V2 register is used for flooding packet replication. It is a new version of SFTR in order to support 1024 bits of local_port. Add SFTR-V2 register and use it instead of SFTR. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: Add support for more than 256 ports in SBSR registerAmit Cohen
Add 'port_page' field in SBSR to be able to query occupancy of more than 256 ports. The field determines the range of the ports specified in the 'ingress_port_mask' and 'egress_port_mask' bit masks: >From '256 * port_page' to '256 * port_page + 255'. For each local port, the appropriate port page is used. A query is never performed for a port range that spans multiple port pages. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: Use u16 for local_port field instead of u8Amit Cohen
Currently, local_port field is saved as u8, which means that maximum 256 ports can be used. As preparation for Spectrum-4, which will support more than 256 ports, local_port field should be extended. Save local_port as u16 to allow use of additional ports. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: reg: Adjust PPCNT register to support local port 255Amit Cohen
Local port 255 has a special meaning in PPCNT register, it is used to refer to all local ports. This wild card ability is not currently used by the driver. Special casing local port 255 in Spectrum-4 systems where it is a valid port is going to be a problem. Work around this issue by adding and always setting the 'lp_gl' bit which instructs the device's firmware to treat this local port like an ordinary port. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: reg: Increase 'port_num' field in PMTDB registerAmit Cohen
'port_num' field is used to indicate the local port value which can be assigned to a module. Increase the field from 8 bits to 10 bits in order to support more than 255 ports. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: reg: Align existing registers to use extended local_port fieldAmit Cohen
Add support for 10-bit local ports in device registers by making use of the MLXSW_ITEM32_LP() macro that was added in the previous patch. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: item: Add support for local_port field in a split formAmit Cohen
Currently, local_port field uses 8 bits, which means that maximum 256 ports can be used. As preparation for the next ASIC, which will support more than 256 ports, local_port field should be extended to 10 bits. It is not possible to use 10 consecutive bits in all registers, and therefore, the field is split into 2 fields: 1. local_port - the existing 8 bits, represent LSB of the extended field. 2. lp_msb - extra 2 bits, represent MSB of the extended field. To avoid complex programming when reading/writing local_port, add a dedicated macro which creates get and set functions which handle both parts of local_port. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: reg: Remove unused functionsAmit Cohen
The functions mlxsw_reg_sfd_uc_unpack() and mlxsw_reg_sfd_uc_lag_unpack() are not used. Remove them. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-01mlxsw: spectrum: Bump minimum FW version to xx.2010.1006Amit Cohen
Add latest verified version of Nvidia Spectrum-family switch firmware, for Spectrum (13.2010.1006), Spectrum-2 (29.2010.1006) and Spectrum-3 (30.2010.1006). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-30devlink: Simplify devlink resources unregister callLeon Romanovsky
The devlink_resources_unregister() used second parameter as an entry point for the recursive removal of devlink resources. None of the callers outside of devlink core needed to use this field, so let's remove it. As part of this removal, the "struct devlink_resource" was moved from .h to .c file as it is not possible to use in any place in the code except devlink.c. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ipa/ipa_main.c 8afc7e471ad3 ("net: ipa: separate disabling setup from modem stop") 76b5fbcd6b47 ("net: ipa: kill ipa_modem_init()") Duplicated include, drop one. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-23mlxsw: spectrum: Protect driver from buggy firmwareAmit Cohen
When processing port up/down events generated by the device's firmware, the driver protects itself from events reported for non-existent local ports, but not the CPU port (local port 0), which exists, but lacks a netdev. This can result in a NULL pointer dereference when calling netif_carrier_{on,off}(). Fix this by bailing early when processing an event reported for the CPU port. Problem was only observed when running on top of a buggy emulator. Fixes: 28b1987ef506 ("mlxsw: spectrum: Register CPU port with devlink") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23mlxsw: spectrum: Allow driver to load with old firmware versionsDanielle Ratson
The driver fails to load with old firmware versions that cannot report the maximum number of RIF MAC profiles [1]. Fix this by defaulting to a maximum of a single profile in such situations, as multiple profiles are not supported by old firmware versions. [1] mlxsw_spectrum 0000:03:00.0: cannot register bus device mlxsw_spectrum: probe of 0000:03:00.0 failed with error -5 Fixes: 1c375ffb2efab ("mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resource") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23mlxsw: pci: Add shutdown method in PCI driverDanielle Ratson
On an arm64 platform with the Spectrum ASIC, after loading and executing a new kernel via kexec, the following trace [1] is observed. This seems to be caused by the fact that the device is not properly shutdown before executing the new kernel. Fix this by implementing a shutdown method which mirrors the remove method, as recommended by the kexec maintainer [2][3]. [1] BUG: Bad page state in process devlink pfn:22f73d page:fffffe00089dcf40 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x2ffff00000000000() raw: 2ffff00000000000 0000000000000000 ffffffff089d0201 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 page dumped because: nonzero _refcount Modules linked in: CPU: 1 PID: 16346 Comm: devlink Tainted: G B 5.8.0-rc6-custom-273020-gac6b365b1bf5 #44 Hardware name: Marvell Armada 7040 TX4810M (DT) Call trace: dump_backtrace+0x0/0x1d0 show_stack+0x1c/0x28 dump_stack+0xbc/0x118 bad_page+0xcc/0xf8 check_free_page_bad+0x80/0x88 __free_pages_ok+0x3f8/0x418 __free_pages+0x38/0x60 kmem_freepages+0x200/0x2a8 slab_destroy+0x28/0x68 slabs_destroy+0x60/0x90 ___cache_free+0x1b4/0x358 kfree+0xc0/0x1d0 skb_free_head+0x2c/0x38 skb_release_data+0x110/0x1a0 skb_release_all+0x2c/0x38 consume_skb+0x38/0x130 __dev_kfree_skb_any+0x44/0x50 mlxsw_pci_rdq_fini+0x8c/0xb0 mlxsw_pci_queue_fini.isra.0+0x28/0x58 mlxsw_pci_queue_group_fini+0x58/0x88 mlxsw_pci_aqs_fini+0x2c/0x60 mlxsw_pci_fini+0x34/0x50 mlxsw_core_bus_device_unregister+0x104/0x1d0 mlxsw_devlink_core_bus_device_reload_down+0x2c/0x48 devlink_reload+0x44/0x158 devlink_nl_cmd_reload+0x270/0x290 genl_rcv_msg+0x188/0x2f0 netlink_rcv_skb+0x5c/0x118 genl_rcv+0x3c/0x50 netlink_unicast+0x1bc/0x278 netlink_sendmsg+0x194/0x390 __sys_sendto+0xe0/0x158 __arm64_sys_sendto+0x2c/0x38 el0_svc_common.constprop.0+0x70/0x168 do_el0_svc+0x28/0x88 el0_sync_handler+0x88/0x190 el0_sync+0x140/0x180 [2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1195432.html [3] https://patchwork.kernel.org/project/linux-scsi/patch/20170212214920.28866-1-anton@ozlabs.org/#20116693 Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23mlxsw: spectrum_router: Remove deadcode in mlxsw_sp_rif_mac_profile_findDanielle Ratson
The function idr_for_each_entry() already checks that the next entry in the IDR is not NULL. Therefore, checking that again in every iteration leads to deadcode. Remove the unnecessary check in order to avoid that. Addresses-Coverity: ("Logically dead code") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19mlxsw: constify address in mlxsw_sp_port_dev_addr_setJakub Kicinski
Argument comes from netdev->dev_addr directly, it needs a const. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-06Merge tag 'pci-v5.16-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Conserve IRQs by setting up portdrv IRQs only when there are users (Jan Kiszka) - Rework and simplify _OSC negotiation for control of PCIe features (Joerg Roedel) - Remove struct pci_dev.driver pointer since it's redundant with the struct device.driver pointer (Uwe Kleine-König) Resource management: - Coalesce contiguous host bridge apertures from _CRS to accommodate BARs that cover more than one aperture (Kai-Heng Feng) Sysfs: - Check CAP_SYS_ADMIN before parsing user input (Krzysztof Wilczyński) - Return -EINVAL consistently from "store" functions (Krzysztof Wilczyński) - Use sysfs_emit() in endpoint "show" functions to avoid buffer overruns (Kunihiko Hayashi) PCIe native device hotplug: - Ignore Link Down/Up caused by resets during error recovery so endpoint drivers can remain bound to the device (Lukas Wunner) Virtualization: - Avoid bus resets on Atheros QCA6174, where they hang the device (Ingmar Klein) - Work around Pericom PI7C9X2G switch packet drop erratum by using store and forward mode instead of cut-through (Nathan Rossi) - Avoid trying to enable AtomicOps on VFs; the PF setting applies to all VFs (Selvin Xavier) MSI: - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry Song) VPD: - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere in the possible VPD space; use these to simplify the cxgb3 driver (Heiner Kallweit) Peer-to-peer DMA: - Add (not subtract) the bus offset when calculating DMA address (Wang Lu) ASPM: - Re-enable LTR at Downstream Ports so they don't report Unsupported Requests when reset or hot-added devices send LTR messages (Mingchuang Qiao) Apple PCIe controller driver: - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc Zyngier) Cadence PCIe controller driver: - Return success when probe succeeds instead of falling into error path (Li Chen) HiSilicon Kirin PCIe controller driver: - Reorganize PHY logic and add support for external PHY drivers (Mauro Carvalho Chehab) - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro Carvalho Chehab) - Add Kirin 970 support (Mauro Carvalho Chehab) - Make driver removable (Mauro Carvalho Chehab) Intel VMD host bridge driver: - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping enabled (Adrian Huang) - Number each controller so we can tell them apart in /proc/interrupts (Chunguang Xu) - Avoid building on UML because VMD depends on x86 bare metal APIs (Johannes Berg) Marvell Aardvark PCIe controller driver: - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár) - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár) - Downgrade PIO Response Status messages to debug level (Marek Behún) - Preserve CRS SV (Config Request Retry Software Visibility) bit in emulated Root Control register (Pali Rohár) - Fix issue in configuring reference clock (Pali Rohár) - Don't clear status bits for masked interrupts (Pali Rohár) - Don't mask unused interrupts (Pali Rohár) - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún) - Retry config accesses on CRS response (Pali Rohár) - Simplify emulated Root Capabilities initialization (Pali Rohár) - Fix several link training issues (Pali Rohár) - Fix link-up checking via LTSSM (Pali Rohár) - Fix reporting of Data Link Layer Link Active (Pali Rohár) - Fix emulation of W1C bits (Marek Behún) - Fix MSI domain .alloc() method to return zero on success (Marek Behún) - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits (Marek Behún) - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits at startup; PCI core will set those as necessary (Pali Rohár) - When operating as a Root Port, set class code to "PCI Bridge" instead of the default "Mass Storage Controller" (Pali Rohár) - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't implement this per spec (Pali Rohár) - Add emulation of option ROM BAR since aardvark doesn't implement this per spec (Pali Rohár) MediaTek MT7621 PCIe controller driver: - Add MediaTek MT7621 PCIe host controller driver and DT binding (Sergio Paracuellos) Qualcomm PCIe controller driver: - Add SC8180x compatible string (Bjorn Andersson) - Add endpoint controller driver and DT binding (Manivannan Sadhasivam) - Restructure to use of_device_get_match_data() (Prasad Malisetty) - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty) Renesas R-Car PCIe controller driver: - Remove unnecessary includes (Geert Uytterhoeven) Rockchip DesignWare PCIe controller driver: - Add DT binding (Simon Xue) Socionext UniPhier Pro5 controller driver: - Serialize INTx masking/unmasking (Kunihiko Hayashi) Synopsys DesignWare PCIe controller driver: - Run dwc .host_init() method before registering MSI interrupt handler so we can deal with pending interrupts left by bootloader (Bjorn Andersson) - Clean up Kconfig dependencies (Andy Shevchenko) - Export symbols to allow more modular drivers (Luca Ceresoli) TI DRA7xx PCIe controller driver: - Allow host and endpoint drivers to be modules (Luca Ceresoli) - Enable external clock if present (Luca Ceresoli) TI J721E PCIe driver: - Disable PHY when probe fails after initializing it (Christophe JAILLET) MicroSemi Switchtec management driver: - Return error to application when command execution fails because an out-of-band reset has cleared the device BARs, Memory Space Enable, etc (Kelvin Cao) - Fix MRPC error status handling issue (Kelvin Cao) - Mask out other bits when reading of management VEP instance ID (Kelvin Cao) - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions (Kelvin Cao) - Add check of event support (Logan Gunthorpe) Miscellaneous: - Remove unused pci_pool wrappers, which have been replaced by dma_pool (Cai Huoqing) - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof Wilczyński) - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof Wilczyński) - Fix some sscanf(), sprintf() format mismatches (Krzysztof Wilczyński) - Update PCI subsystem information in MAINTAINERS (Krzysztof Wilczyński) - Correct some misspellings (Krzysztof Wilczyński)" * tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits) PCI: Add ACS quirk for Pericom PI7C9X2G switches PCI: apple: Configure RID to SID mapper on device addition iommu/dart: Exclude MSI doorbell from PCIe device IOVA range PCI: apple: Implement MSI support PCI: apple: Add INTx and per-port interrupt support PCI: kirin: Allow removing the driver PCI: kirin: De-init the dwc driver PCI: kirin: Disable clkreq during poweroff sequence PCI: kirin: Move the power-off code to a common routine PCI: kirin: Add power_off support for Kirin 960 PHY PCI: kirin: Allow building it as a module PCI: kirin: Add MODULE_* macros PCI: kirin: Add Kirin 970 compatible PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge PCI: apple: Set up reference clocks when probing PCI: apple: Add initial hardware bring-up PCI: of: Allow matching of an interrupt-map local to a PCI device of/irq: Allow matching of an interrupt-map local to an interrupt controller irqdomain: Make of_phandle_args_to_fwspec() generally available PCI: Do not enable AtomicOps on VFs ...
2021-10-28mlxsw: spectrum_qdisc: Offload root TBF as port shaperPetr Machata
The Spectrum ASIC allows configuration of maximum shaper on all levels of the scheduling hierarchy: TCs, subgroups, groups and also ports. Currently, TBF always configures a subgroup. But a user could reasonably express the intent to configure port shaper by putting TBF to a root position, around ETS / PRIO. Accept this usage and offload appropriately. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h 7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable") 4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resourceDanielle Ratson
Expose via devlink-resource the maximum number of RIF MAC profiles and their current occupancy, so it can be used for debug and writing generic tests, like in the next patch. Example for Spectrum-2 output: $ devlink resource show pci/0000:06:00.0 ... name rif_mac_profiles size 4 occ 0 unit entry dpipe_tables none Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26mlxsw: spectrum_router: Add RIF MAC profiles supportDanielle Ratson
Currently, mlxsw enforces that all the router interfaces (RIFs) have the same MAC prefix. Relax this limitation by using RIF MAC profiles. Each profile is associated with a particular MAC prefix and multiple RIFs can use the same profile. Therefore, the number of possible MAC prefixes is no longer one, but the number of profiles supported by the device. Store the profiles in an IDR and reference count them according to the number of RIFs using them. Associate a RIF with a profile when the RIF is created and remove the association when the RIF is deleted. Change the association following 'NETDEV_CHANGEADDR' events, except when only one RIF is using the profile. In which case, change the MAC prefix of the profile itself instead of associating the RIF with a new profile. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26mlxsw: spectrum_router: Propagate extack furtherDanielle Ratson
The next patch will set the MAC profile of a router interface (RIF) as part of its configure() callback. The operation can fail in case the maximum number of profiles was exceeded. Add extack to mlxsw_sp_rif_ops::configure() in order to communicate such failures to user space. In addition, the MAC profile of a RIF can change following a 'NETDEV_CHANGEADDR' notification. Propagate extack to mlxsw_sp_router_port_change_event() so that failures could be communicated in this path as well. No functional changes intended. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26mlxsw: resources: Add resource identifier for RIF MAC profilesDanielle Ratson
Add a resource identifier for maximum RIF MAC profiles so that it could be later used to query the information from firmware. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26mlxsw: reg: Add MAC profile ID field to RITR registerDanielle Ratson
Add MAC profile ID field to RITR register so that it could be used for associating a RIF with a MAC profile ID by a later patch. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25mlxsw: pci: Recycle received packet upon allocation failureIdo Schimmel
When the driver fails to allocate a new Rx buffer, it passes an empty Rx descriptor (contains zero address and size) to the device and marks it as invalid by setting the skb pointer in the descriptor's metadata to NULL. After processing enough Rx descriptors, the driver will try to process the invalid descriptor, but will return immediately seeing that the skb pointer is NULL. Since the driver no longer passes new Rx descriptors to the device, the Rx queue will eventually become full and the device will start to drop packets. Fix this by recycling the received packet if allocation of the new packet failed. This means that allocation is no longer performed at the end of the Rx routine, but at the start, before tearing down the DMA mapping of the received packet. Remove the comment about the descriptor being zeroed as it is no longer correct. This is OK because we either use the descriptor as-is (when recycling) or overwrite its address and size fields with that of the newly allocated Rx buffer. The issue was discovered when a process ("perf") consumed too much memory and put the system under memory pressure. It can be reproduced by injecting slab allocation failures [1]. After the fix, the Rx queue no longer comes to a halt. [1] # echo 10 > /sys/kernel/debug/failslab/times # echo 1000 > /sys/kernel/debug/failslab/interval # echo 100 > /sys/kernel/debug/failslab/probability FAULT_INJECTION: forcing a failure. name failslab, interval 1000, probability 100, space 0, times 8 [...] Call Trace: <IRQ> dump_stack_lvl+0x34/0x44 should_fail.cold+0x32/0x37 should_failslab+0x5/0x10 kmem_cache_alloc_node+0x23/0x190 __alloc_skb+0x1f9/0x280 __netdev_alloc_skb+0x3a/0x150 mlxsw_pci_rdq_skb_alloc+0x24/0x90 mlxsw_pci_cq_tasklet+0x3dc/0x1200 tasklet_action_common.constprop.0+0x9f/0x100 __do_softirq+0xb5/0x252 irq_exit_rcu+0x7a/0xa0 common_interrupt+0x83/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:cpuidle_enter_state+0xc8/0x340 [...] mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25mlxsw: spectrum: Use 'bitmap_zalloc()' when applicableChristophe JAILLET
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid some open-coded arithmetic in allocator arguments. Also change the corresponding 'kfree()' into 'bitmap_free()' to keep consistency. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19ethernet: mlxsw: use eth_hw_addr_gen()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Make RED, TBF offloads classfulPetr Machata
Permit offloading qdiscs below RED and TBF. In order to avoid having to implement trivial propagating callbacks for get_prio_bitmap and get_tclass_num, extend mlxsw_sp_qdisc_get_prio_bitmap() and ..._get_tclass_num() to handle the lack of the callback as a cue to forward the request to the parent. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Validate qdisc topologyPetr Machata
A following patch will enable offloading qdiscs that are deeper than directly under root qdisc. Currently the topology validation consists of demanding a root qdisc position for ETS and PRIO. Since RED and TBF are considered classless, this is enough. In order to prevent some nonsensical combinations when RED and TBF become classful, introduce a more general topology validator. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Clean stats recursively when priomap changesPetr Machata
On Spectrum, there are no per-TC TX counters. Instead, mlxsw uses per-prio counters and aggregates them according to the priomap. Therefore when priomap changes, the counter base values need to be reset to reflect the change. Previously, this was only done for the sole child qdisc, but a following patch makes RED and TBF classful. Thus apply the request to the whole sub-tree. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Unify graft validationPetr Machata
Qdisc graft operations have so far been reported at PRIO, ETS and RED, with RED events ignored, because RED was not considered a classful qdisc. A following patch will make mlxsw recognize RED and TBF as classful qdiscs, and thus it is necessary to validate grafting at these qdiscs as well. Rename the existing graft validator to make it clear that it is a generic function, and invoke for RED and TBF graft events as well. Drop the unnecessary PRIO helper and invoke the graft validator directly for PRIO as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Destroy children in mlxsw_sp_qdisc_destroy()Petr Machata
Currently ETS and PRIO are the only offloaded classful qdiscs. Since they are both similar, their destroy handler is the same, and it handles children destruction itself. But now it is possible to do it generically for any classful qdisc. Therefore promote the recursive destruction from the ETS handler to mlxsw_sp_qdisc_destroy(), so that RED and TBF pick it up in follow-up patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Extract two helpers for handling future FIFOsPetr Machata
Extract from __mlxsw_sp_qdisc_ets_replace() two helpers for handling of one future FIFO resp. reinitializing the array of future FIFOs. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Query tclass / priomap instead of caching itPetr Machata
Currently when keeping track of qdiscs, mlxsw notes the TC and priomap corresponding to each qdisc. That is fine currently, as there only ever is one level of qdiscs to update: the direct children of ETS / PRIO. However as deeper structures are made offloadable, ETS would need to update these values for the complete subtree, and interim qdiscs would need to remember to propagate the value. Instead, reverse the responsibility: child qdiscs can ask their parent what their TC and priomap are. ETS / PRIO know the answer right away, or there are defaults for when the root qdisc does not assign them (e.g. when RED is used as root qdisc). When RED and TBF become classful, they will simply forward the request up to their parent. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14mlxsw: thermal: Fix out-of-bounds memory accessesIdo Schimmel
Currently, mlxsw allows cooling states to be set above the maximum cooling state supported by the driver: # cat /sys/class/thermal/thermal_zone2/cdev0/type mlxsw_fan # cat /sys/class/thermal/thermal_zone2/cdev0/max_state 10 # echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state # echo $? 0 This results in out-of-bounds memory accesses when thermal state transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the transition table is accessed with a too large index (state) [1]. According to the thermal maintainer, it is the responsibility of the driver to reject such operations [2]. Therefore, return an error when the state to be set exceeds the maximum cooling state supported by the driver. To avoid dead code, as suggested by the thermal maintainer [3], partially revert commit a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels") that tried to interpret these invalid cooling states (above the maximum) in a special way. The cooling levels array is not removed in order to prevent the fans going below 20% PWM, which would cause them to get stuck at 0% PWM. [1] BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290 Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122 Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016 Workqueue: events_freezable_power_ thermal_zone_device_check Call Trace: dump_stack_lvl+0x8b/0xb3 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x7f/0x11b thermal_cooling_device_stats_update+0x271/0x290 __thermal_cdev_update+0x15e/0x4e0 thermal_cdev_update+0x9f/0xe0 step_wise_throttle+0x770/0xee0 thermal_zone_device_update+0x3f6/0xdf0 process_one_work+0xa42/0x1770 worker_thread+0x62f/0x13e0 kthread+0x3ee/0x4e0 ret_from_fork+0x1f/0x30 Allocated by task 1: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 thermal_cooling_device_setup_sysfs+0x153/0x2c0 __thermal_cooling_device_register.part.0+0x25b/0x9c0 thermal_cooling_device_register+0xb3/0x100 mlxsw_thermal_init+0x5c5/0x7e0 __mlxsw_core_bus_device_register+0xcb3/0x19c0 mlxsw_core_bus_device_register+0x56/0xb0 mlxsw_pci_probe+0x54f/0x710 local_pci_probe+0xc6/0x170 pci_device_probe+0x2b2/0x4d0 really_probe+0x293/0xd10 __driver_probe_device+0x2af/0x440 driver_probe_device+0x51/0x1e0 __driver_attach+0x21b/0x530 bus_for_each_dev+0x14c/0x1d0 bus_add_driver+0x3ac/0x650 driver_register+0x241/0x3d0 mlxsw_sp_module_init+0xa2/0x174 do_one_initcall+0xee/0x5f0 kernel_init_freeable+0x45a/0x4de kernel_init+0x1f/0x210 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff8881052f7800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 1016 bytes inside of 1024-byte region [ffff8881052f7800, ffff8881052f7c00) The buggy address belongs to the page: page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0 head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/ [3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/ CC: Daniel Lezcano <daniel.lezcano@linaro.org> Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone") Fixes: a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>