linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-06-24	sctp: send the next probe immediately once the last one is acked	Xin Long
	These is no need to wait for 'interval' period for the next probe if the last probe is already acked in search state. The 'interval' period waiting should be only for probe failure timeout and the current pmtu check when it's in search complete state. This change will shorten the probe time a lot in search state, and also fix the document accordingly. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	sctp: do black hole detection in search complete state	Xin Long
	Currently the PLPMUTD probe will stop for a long period (interval * 30) after it enters search complete state. If there's a pmtu change on the route path, it takes a long time to be aware if the ICMP TooBig packet is lost or filtered. As it says in rfc8899#section-4.3: "A DPLPMTUD method MUST NOT rely solely on this method." (ICMP PTB message). This patch is to enable the other method for search complete state: "A PL can use the DPLPMTUD probing mechanism to periodically generate probe packets of the size of the current PLPMTU." With this patch, the probe will continue with the current pmtu every 'interval' until the PMTU_RAISE_TIMER 'timeout', which we implement by adding raise_count to raise the probe size when it counts to 30 and removing the SCTP_PL_COMPLETE check for PMTU_RAISE_TIMER. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge branch 'sja1110-doc'	David S. Miller
	Vladimir Oltean says: ==================== Document the NXP SJA1110 switch as supported Now that most of the basic work for SJA1110 support has been done in the sja1105 DSA driver, let's add the missing documentation bits to make it clear that the driver can be used. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: dsa: sja1105: document the SJA1110 in the Kconfig	Vladimir Oltean
	Mention support for the SJA1110 in menuconfig. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Documentation: net: dsa: add details about SJA1110	Vladimir Oltean
	Denote that the new switch generation is supported, detail its pin strapping options (with differences compared to SJA1105) and explain how MDIO access to the internal 100base-T1 and 100base-TX PHYs is performed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-06-24 This series contains updates to i40e driver only. Dinghao Liu corrects error handling for failed i40e_vsi_request_irq() call. Mateusz allows for disabling of autonegotiation for all BaseT media. Jesse corrects the multiplier being used on 5Gb speeds for PTP. Jan adds locking in paths calling i40e_setup_pf_switch() that were missing it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	i2c: dev: Add __user annotation	Andreas Hecht
	Fix Sparse warnings: drivers/i2c/i2c-dev.c:546:19: warning: incorrect type in assignment (different address spaces) drivers/i2c/i2c-dev.c:549:53: warning: incorrect type in argument 2 (different address spaces) compat_ptr() returns a pointer tagged __user which gets assigned to a pointer missing the __user annotation. The same pointer is passed to copy_from_user() as an argument where it is expected to have the __user annotation. Fix both by adding the __user annotation to the pointer. Fixes: 7d5cb45655f2 ("i2c compat ioctls: move to ->compat_ioctl()") Signed-off-by: Andreas Hecht <andreas.e.hecht@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-06-24	Merge branch 'gve-dqo'	David S. Miller
	Bailey Forrest says: ==================== gve: Introduce DQO descriptor format DQO is the descriptor format for our next generation virtual NIC. The existing descriptor format will be referred to as "GQI" in the patch set. One major change with DQO is it uses dual descriptor rings for both TX and RX queues. The TX path uses a TX queue to send descriptors to HW, and receives packet completion events on a TX completion queue. The RX path posts buffers to HW using an RX buffer queue and receives incoming packets on an RX queue. One important note is that DQO descriptors and doorbells are little endian. We continue to use the existing big endian control plane infrastructure. The general format of the patch series is: - Refactor existing code/data structures to be shared by DQO - Expand admin queues to support DQO device setup - Expand data structures and device setup to support DQO - Add logic to setup DQO queues - Implement datapath ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add RX path	Bailey Forrest
	The RX queue has an array of `gve_rx_buf_state_dqo` objects. All allocated pages have an associated buf_state object. When a buffer is posted on the RX buffer queue, the buffer ID will be the buf_state's index into the RX queue's array. On packet reception, the RX queue will have one descriptor for each buffer associated with a received packet. Each RX descriptor will have a buffer_id that was posted on the buffer queue. Notable mentions: - We use a default buffer size of 2048 bytes. Based on page size, we may post separate sections of a single page as separate buffers. - The driver holds an extra reference on pages passed up the receive path with an skb and keeps these pages on a list. When posting new buffers to the NIC, we check if any of these pages has only our reference, or another buffer sized segment of the page has no references. If so, it is free to reuse. This page recycling approach is a common netdev optimization that reduces page alloc/free calls. - Pages in the free list have a page_count bias in order to avoid an atomic increment of pagecount every time we attempt to reuse a page. # references = page_count() - bias - In order to track when a page is safe to reuse, we keep track of the last offset which had a single SKB reference. When this occurs, it implies that every single other offset is reusable. Otherwise, we don't know if offsets can be safely reused. - We maintain two free lists of pages. List #1 (recycled_buf_states) contains pages we know can be reused right away. List #2 (used_buf_states) contains pages which cannot be used right away. We only attempt to get pages from list #2 when list #1 is empty. We only attempt to use a small fixed number pages from list #2 before giving up and allocating a new page. Both lists are FIFOs in hope that by the time we attempt to reuse a page, the references were dropped. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add TX path	Bailey Forrest
	TX SKBs will have their buffers DMA mapped with the device. Each buffer will have at least one TX descriptor associated. Each SKB will also have a metadata descriptor. Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects. Every TX SKB will have an associated pending_packet object. A TX SKB's descriptors will use its pending_packet's index as the completion tag, which will be returned on the TX completion queue. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. Notable mentions: - Buffers may be freed after receiving the miss-completion, but in order to avoid packet reordering, we do not complete the SKB until receiving the reinjection completion. - The driver must robustly handle the unlikely scenario where a miss completion does not have an associated reinjection completion. This is accomplished by maintaining a list of packets which have a pending reinjection completion. After a short timeout (5 seconds), the SKB and buffers are released and the pending_packet is moved to a second list which has a longer timeout (60 seconds), where the pending_packet will not be reused. When the longer timeout elapses, the driver may assume the reinjection completion would never be received and the pending_packet may be reused. - Completion handling is triggered by an interrupt and is done in the NAPI poll function. Because the TX path and completion exist in different threading contexts they maintain their own lists for free pending_packet objects. The TX path uses a lock-free approach to steal the list from the completion path. - Both the TSO context and general context descriptors have metadata bytes. The device requires that if multiple descriptors contain the same field, each descriptor must have the same value set for that field. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Configure interrupts on device up	Bailey Forrest
	When interrupts are first enabled, we also set the ratelimits, which will be static for the entire usage of the device. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add ring allocation and initialization	Bailey Forrest
	Allocate the buffer and completion ring structures. Do not populate the rings yet. That will happen in the respective rx and tx datapath follow-on patches Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add core netdev features	Bailey Forrest
	Add napi netdev device registration, interrupt handling and initial tx and rx polling stubs. The stubs will be filled in follow-on patches. Also: - LRO feature advertisement and handling - Also update ethtool logic Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Update adminq commands to support DQO queues	Bailey Forrest
	DQO queue creation requires additional parameters: - TX completion/RX buffer queue size - TX completion/RX buffer queue address - TX/RX queue size - RX buffer size Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add DQO fields for core data structures	Bailey Forrest
	- Add new DQO datapath structures: - `gve_rx_buf_queue_dqo` - `gve_rx_compl_queue_dqo` - `gve_rx_buf_state_dqo` - `gve_tx_desc_dqo` - `gve_tx_pending_packet_dqo` - Incorporate these into the existing ring data structures: - `gve_rx_ring` - `gve_tx_ring` Noteworthy mentions: - `gve_rx_buf_state` represents an RX buffer which was posted to HW. Each RX queue has an array of these objects and the index into the array is used as the buffer_id when posted to HW. - `gve_tx_pending_packet_dqo` is treated similarly for TX queues. The completion_tag is the index into the array. - These two structures have links for linked lists which are represented by 16b indexes into a contiguous array of these structures. This reduces memory footprint compared to 64b pointers. - We use unions for the writeable datapath structures to reduce cache footprint. GQI specific members will renamed like DQO members in a future patch. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add dqo descriptors	Bailey Forrest
	General description of rings and descriptors: TX ring is used for sending TX packet buffers to the NIC. It has the following descriptors: - `gve_tx_pkt_desc_dqo` - Data buffer descriptor - `gve_tx_tso_context_desc_dqo` - TSO context descriptor - `gve_tx_general_context_desc_dqo` - Generic metadata descriptor Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo` which represents the logical interpetation of the metadata bytes. It's helpful to define this structure because the metadata bytes exist in multiple descriptor types (including `gve_tx_tso_context_desc_dqo`), and the device requires same field has the same value in all descriptors. The TX completion ring is used to receive completions from the NIC. Having a separate ring allows for completions to be out of order. The completion descriptor `gve_tx_compl_desc` has several different types, most important are packet and descriptor completions. Descriptor completions are used to notify the driver when descriptors sent on the TX ring are done being consumed. The descriptor completion is only used to signal that space is cleared in the TX ring. A packet completion will be received when a packet transmitted on the TX queue is done being transmitted. In addition there are "miss" and "reinjection" completions. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. The RX buffer ring is used to send buffers to HW via the `gve_rx_desc_dqo` descriptor. Received packets are put into the RX queue by the device, which populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors refer to buffers posted by the buffer queue. Received buffers may be returned out of order, such as when HW LRO is enabled. Important concepts: - "TX" and "RX buffer" queues, which send descriptors to the device, use MMIO doorbells to notify the device of new descriptors. - "RX" and "TX completion" queues, which receive descriptors from the device, use a "generation bit" to know when a descriptor was populated by the device. The driver initializes all bits with the "current generation". The device will populate received descriptors with the "next generation" which is inverted from the current generation. When the ring wraps, the current/next generation are swapped. - It's the driver's responsibility to ensure that the RX and TX completion queues are not overrun. This can be accomplished by limiting the number of descriptors posted to HW. - TX packets have a 16 bit completion_tag and RX buffers have a 16 bit buffer_id. These will be returned on the TX completion and RX queues respectively to let the driver know which packet/buffer was completed. Bitfields are used to describe descriptor fields. This notation is more concise and readable than shift-and-mask. It is possible because the driver is restricted to little endian platforms. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add support for DQO RX PTYPE map	Bailey Forrest
	Unlike GQI, DQO RX descriptors do not contain the L3 and L4 type of the packet. L3 and L4 types are necessary in order to set the hash and csum on RX SKBs correctly. DQO RX descriptors instead contain a 10 bit PTYPE index. The PTYPE map enables the device to tell the driver how to map from PTYPE index to L3/L4 type. The device doesn't provide any guarantees about the range of possible PTYPEs, so we just use a 1024 entry array to implement a fast mapping structure. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: adminq: DQO specific device descriptor logic	Bailey Forrest
	- In addition to TX and RX queues, DQO has TX completion and RX buffer queues. - TX completions are received when the device has completed sending a packet on the wire. - RX buffers are posted on a separate queue form the RX completions. - DQO descriptor rings are allowed to be smaller than PAGE_SIZE. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Introduce per netdev `enum gve_queue_format`	Bailey Forrest
	The currently supported queue formats are: - GQI_RDA - GQI with raw DMA addressing - GQI_QPL - GQI with queue page list - DQO_RDA - DQO with raw DMA addressing The old `gve_priv.raw_addressing` value is only used for GQI_RDA, so we remove it in favor of just checking against GQI_RDA Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Introduce a new model for device options	Bailey Forrest
	The current model uses an integer ID and a fixed size struct for the parameters of each device option. The new model allows the device option structs to grow in size over time. A driver may assume that changes to device option structs will always be appended. New device options will also generally have a `supported_features_mask` so that the driver knows which fields within a particular device option are enabled. `gve_device_option.feat_mask` is changed to `required_features_mask`, and it is a bitmask which must match the value expected by the driver. This gives the device the ability to break backwards compatibility with old drivers for certain features by blocking the old drivers from trying to use the feature. We maintain ABI compatibility with the old model for GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which does not support the new model. This patch introduces some new terminology: RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA mapped and read/updated by the device. QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with the device for read/write and data is copied from/to SKBs. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Make gve_rx_slot_page_info.page_offset an absolute offset	Bailey Forrest
	Using `page_offset` like a boolean means a page may only be split into two sections. With page sizes larger than 4k, this can be very wasteful. Future commits in this patchset use `struct gve_rx_slot_page_info` in a way which supports a fixed buffer size and a variable page size. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: gve_rx_copy: Move padding to an argument	Bailey Forrest
	Future use cases will have a different padding value. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Move some static functions to a common file	Bailey Forrest
	These functions will be shared by the GQI and DQO variants of the GVNIC driver as of follow-up patches in this series. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Update GVE documentation to describe DQO	Bailey Forrest
	DQO is a new descriptor format for our next generation virtual NIC. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	ipv6: fix out-of-bound access in ip6_parse_tlv()	Eric Dumazet
	First problem is that optlen is fetched without checking there is more than one byte to parse. Fix this by taking care of IPV6_TLV_PAD1 before fetching optlen (under appropriate sanity checks against len) Second problem is that IPV6_TLV_PADN checks of zero padding are performed before the check of remaining length. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: c1412fce7ecc ("net/ipv6/exthdrs.c: Strict PadN option checking") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge branch 'macsec-key-length'	David S. Miller
	Antoine Tenart says: ==================== net: macsec: fix key length when offloading The key length used to copy the key to offloading drivers and to store it is wrong and was working by chance as it matched the default key length. But using a different key length fails. Fix it by using instead the max length accepted in uAPI to store the key and the actual key length when copying it. This was tested on the MSCC PHY driver but not on the Atlantic MAC (looking at the code it looks ok, but testing would be appreciated). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: atlantic: fix the macsec key length	Antoine Tenart
	The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 27736563ce32 ("net: atlantic: MACSec egress offload implementation") Fixes: 9ff40a751a6f ("net: atlantic: MACSec ingress offload implementation") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: phy: mscc: fix macsec key length	Antoine Tenart
	The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 28c5107aa904 ("net: phy: mscc: macsec support") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: macsec: fix the length used to copy the key for offloading	Antoine Tenart
	The key length used when offloading macsec to Ethernet or PHY drivers was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length accepted in uAPI) and secy->key_len to copy it. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	usbnet: add usbnet_event_names[] for kevent	Yajun Deng
	Modify the netdev_dbg content from int to char * in usbnet_defer_kevent(), this looks more readable. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	libceph: set global_id as soon as we get an auth ticket	Ilya Dryomov
	Commit 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") delayed the setting of global_id too much. It is set only after all tickets are received, but in pre-nautilus clusters an auth ticket and the service tickets are obtained in separate steps (for a total of three MAuth replies). When the service tickets are requested, global_id is used to build an authorizer; if global_id is still 0 we never get them and fail to establish the session. Moving the setting of global_id into protocol implementations. This way global_id can be set exactly when an auth ticket is received, not sooner nor later. Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24	libceph: don't pass result into ac->ops->handle_reply()	Ilya Dryomov
	There is no result to pass in msgr2 case because authentication failures are reported through auth_bad_method frame and in MAuth case an error is returned immediately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24	spi: Fix self assignment issue with ancillary->mode	Colin Ian King
	There is an assignment of ancillary->mode to itself which looks dubious since the proceeding comment states that the speed and mode is taken over from the SPI main device, indicating that ancillary->mode should assigned using the value spi->mode. Fix this. Addresses-Coverity: ("Self assignment") Fixes: 0c79378c0199 ("spi: add ancillary device support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210623172300.161484-1-colin.king@canonical.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-24	Merge branch 'add-sparx5i-driver'	David S. Miller
	Steen Hegelund says: ==================== Adding the Sparx5i Switch Driver This series provides the Microchip Sparx5i Switch Driver The SparX-5 Enterprise Ethernet switch family provides a rich set of Enterprise switching features such as advanced TCAM-based VLAN and QoS processing enabling delivery of differentiated services, and security through TCAMbased frame processing using versatile content aware processor (VCAP). IPv4/IPv6 Layer 3 (L3) unicast and multicast routing is supported with up to 18K IPv4/9K IPv6 unicast LPM entries and up to 9K IPv4/3K IPv6 (S,G) multicast groups. L3 security features include source guard and reverse path forwarding (uRPF) tasks. Additional L3 features include VRF-Lite and IP tunnels (IP over GRE/IP). The SparX-5 switch family features a highly flexible set of Ethernet ports with support for 10G and 25G aggregation links, QSGMII, USGMII, and USXGMII. The device integrates a powerful 1 GHz dual-core ARM® Cortex®-A53 CPU enabling full management of the switch and advanced Enterprise applications. The SparX-5 switch family targets managed Layer 2 and Layer 3 equipment in SMB, SME, and Enterprise where high port count 1G/2.5G/5G/10G switching with 10G/25G aggregation links is required. The SparX-5 switch family consists of following SKUs: VSC7546 SparX-5-64 supports up to 64 Gbps of bandwidth with the following primary port configurations. - 6 ×10G - 16 × 2.5G + 2 × 10G - 24 × 1G + 4 × 10G VSC7549 SparX-5-90 supports up to 90 Gbps of bandwidth with the following primary port configurations. - 9 × 10G - 16 × 2.5G + 4 × 10G - 48 × 1G + 4 × 10G VSC7552 SparX-5-128 supports up to 128 Gbps of bandwidth with the following primary port configurations. - 12 × 10G - 6 x 10G + 2 x 25G - 16 × 2.5G + 8 × 10G - 48 × 1G + 8 × 10G VSC7556 SparX-5-160 supports up to 160 Gbps of bandwidth with the following primary port configurations. - 16 × 10G - 10 × 10G + 2 × 25G - 16 × 2.5G + 10 × 10G - 48 × 1G + 10 × 10G VSC7558 SparX-5-200 supports up to 200 Gbps of bandwidth with the following primary port configurations. - 20 × 10G - 8 × 25G In addition, the device supports one 10/100/1000/2500/5000 Mbps SGMII/SerDes node processor interface (NPI) Ethernet port. Time sensitive networking (TSN) is supported through a comprehensive set of features including frame preemption, cut-through, frame replication and elimination for reliability, enhanced scheduling: credit-based shaping, time-aware shaping, cyclic queuing, and forwarding, and per-stream policing and filtering. Together with IEEE 1588 and IEEE 802.1AS support, this guarantees low-latency deterministic networking for Industrial Ethernet. The Sparx5i support is developed on the PCB134 and PCB135 evaluation boards. - PCB134 main networking features: - 12x SFP+ front 10G module slots (connected to Sparx5i through SFI). - 8x SFP28 front 25G module slots (connected to Sparx5i through SFI high speed). - Optional, one additional 10/100/1000BASE-T (RJ45) Ethernet port (on-board VSC8211 PHY connected to Sparx5i through SGMII). - PCB135 main networking features: - 48x1G (10/100/1000M) RJ45 front ports using 12xVSC8514 QuadPHY’s each connected to VSC7558 through QSGMII. - 4x10G (1G/2.5G/5G/10G) RJ45 front ports using the AQR407 10G QuadPHY each port connects to VSC7558 through SFI. - 4x SFP28 25G module slots on back connected to VSC7558 through SFI high speed. - Optional, one additional 1G (10/100/1000M) RJ45 port using an on-board VSC8211 PHY, which can be connected to VSC7558 NPI port through SGMII using a loopback add-on PCB) This series provides support for: - SFPs and DAC cables via PHYLINK with a number of 5G, 10G and 25G devices and media types. - Port module configuration for 10M to 25G speeds with SGMII, QSGMII, 1000BASEX, 2500BASEX and 10GBASER as appropriate for these modes. - SerDes configuration via the Sparx5i SerDes driver (see below). - Host mode providing register based injection and extraction. - Switch mode providing MAC/VLAN table learning and Layer2 switching offloaded to the Sparx5i switch. - STP state, VLAN support, host/bridge port mode, Forwarding DB, and configuration and statistics via ethtool. More support will be added at a later stage. The Sparx5i Chip Register Model can be browsed at this location: https://github.com/microchip-ung/sparx-5_reginfo and the datasheet is available here: https://ww1.microchip.com/downloads/en/DeviceDoc/SparX-5_Family_L2L3_Enterprise_10G_Ethernet_Switches_Datasheet_00003822B.pdf The series depends on the following series currently on their way into the kernel: - 25G Base-R phy mode Link: https://lore.kernel.org/r/20210611125453.313308-1-steen.hegelund@microchip.com/ - Sparx5 Reset Driver Link: https://lore.kernel.org/r/20210416084054.2922327-1-steen.hegelund@microchip.com/ ChangeLog: v5: - cover letter - updated the description to match the latest data sheets - basic driver - added error message in case of reset controller error - port struct: replacing has_sfp with inband, adding pause_adv - host mode - port cleanup: unregisters netdevs and then removes phylink etc - checking for pause_adv when comparing port config changes - getting duplex and pause state in the link_up callback. - getting inband, autoneg and pause_adv config in the pcs_config callback. - port - use only the pause_adv bits when getting aneg status - use the inband state when updating the PCS and port config v4: - basic driver: Using devm_reset_control_get_optional_shared to get the reset control, and let the reset framework check if it is valid. - host mode (phylink): Use the PCS operations to get state and update configuration. Removed the setting of interface modes. Let phylink control this. Using the new 5gbase-r and 25gbase-r modes. Using a helper function to check if one of the 3 base-r modes has been selected. Currently it will not be possible to change the interface mode by changing the speed (e.g via ethtool). This will be added later. v3: - basic driver: - removed unneeded braces - release reference to ports node after use - use dev_err_probe to handle DEFER - update error value when bailing out (a few cases) - updated formatting of port struct and grouping of bool values - simplified the spx5_rmw and spx5_inst_rmw inline functions - host mode (netdev): - removed lockless flag - added port timer init - host mode (packet - manual injection): - updated error counters in error situations - implemented timer handling of watermark threshold: stop and restart netif queues. - fixed error message handling (rate limited) - fixed comment style error - used DIV_ROUND_UP macro - removed a debug message for open ports v2: - Updated bindings: - drop minItems for the reg property - Statistics implementation: - Reorganized statistics into ethtool groups: eth-phy, eth-mac, eth-ctrl, rmon as defined by the IEEE 802.3 categories and RFC 2819. - The remaining statistics are provided by the classic ethtool statistics command. - Hostmode support: - Removed netdev renaming - Validate ethernet address in sparx5_set_mac_address() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	arm64: dts: sparx5: Add the Sparx5 switch node	Steen Hegelund
	This provides the configuration for the currently available evaluation boards PCB134 and PCB135. The series depends on the following series currently on its way into the kernel: - Sparx5 Reset Driver Link: https://lore.kernel.org/r/20210416084054.2922327-1-steen.hegelund@microchip.com/ Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add ethtool configuration and statistics support	Steen Hegelund
	This adds statistic counters for the network interfaces provided by the driver. It also adds CPU port counters (which are not exposed by ethtool). This also adds support for configuring the network interface parameters via ethtool: speed, duplex, aneg etc. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add calendar bandwidth allocation support	Steen Hegelund
	This configures the Sparx5 calendars according to the bandwidth requested in the Device Tree nodes. It also checks if the total requested bandwidth is within the specs of the detected Sparx5 models limits. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add switching support	Steen Hegelund
	This adds SwitchDev support by hardware offloading the software bridge. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add vlan support	Steen Hegelund
	This adds Sparx5 VLAN support. Sparx5 has more VLAN features than provided here, but these will be added in later series. For now we only add the basic L2 features. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add mactable support	Steen Hegelund
	This adds the Sparx5 MAC tables: listening for MAC table updates and updating on request. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add port module support	Steen Hegelund
	This add configuration of the Sparx5 port module instances. Sparx5 has in total 65 logical ports (denoted D0 to D64) and 33 physical SerDes connections (S0 to S32). The 65th port (D64) is fixed allocated to SerDes0 (S0). The remaining 64 ports can in various multiplexing scenarios be connected to the remaining 32 SerDes using QSGMII, or USGMII or USXGMII extenders. 32 of the ports can have a 1:1 mapping to the 32 SerDes. Some additional ports (D65 to D69) are internal to the device and do not connect to port modules or SerDes macros. For example, internal ports are used for frame injection and extraction to the CPU queues. The 65 logical ports are split up into the following blocks. - 13 x 5G ports (D0-D11, D64) - 32 x 2G5 ports (D16-D47) - 12 x 10G ports (D12-D15, D48-D55) - 8 x 25G ports (D56-D63) Each logical port supports different line speeds, and depending on the speeds supported, different port modules (MAC+PCS) are needed. A port supporting 5 Gbps, 10 Gbps, or 25 Gbps as maximum line speed, will have a DEV5G, DEV10G, or DEV25G module to support the 5 Gbps, 10 Gbps (incl 5 Gbps), or 25 Gbps (including 10 Gbps and 5 Gbps) speeds. As well as, it will have a shadow DEV2G5 port module to support the lower speeds (10/100/1000/2500Mbps). When a port needs to operate at lower speed and the shadow DEV2G5 needs to be connected to its corresponding SerDes Not all interface modes are supported in this series, but will be added at a later stage. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add hostmode with phylink support	Steen Hegelund
	This patch adds netdevs and phylink support for the ports in the switch. It also adds register based injection and extraction for these ports. Frame DMA support for injection and extraction will be added in a later series. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add the basic sparx5 driver	Steen Hegelund
	This adds the Sparx5 basic SwitchDev driver framework with IO range mapping, switch device detection and core clock configuration. Support for ports, phylink, netdev, mactable etc. are in the following patches. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	dt-bindings: net: sparx5: Add sparx5-switch bindings	Steen Hegelund
	Document the Sparx5 switch device driver bindings Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge tag 'linux-can-fixes-for-5.13-20210624' of git://git.kernel.org/	David S. Miller
	pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-06-24 this is a pull request of 2 patches for net/master. The first patch is by Norbert Slusarek and prevent allocation of filter for optlen == 0 in the j1939 CAN protocol. The last patch is by Stephane Grosjean and fixes a potential starvation in the TX path of the peak_pciefd driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge branch 'ibmvnic-fixes'	David S. Miller
	Sukadev Bhattiprolu says: ==================== ibmvnic: Assorted bug fixes Assorted bug fixes that we tested over the last several weeks. Thanks to Brian King, Cris Forno, Dany Madden and Rick Lindsley for reviews and help with testing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	ibmvnic: parenthesize a check	Sukadev Bhattiprolu
	Parenthesize a check to be more explicit and to fix a sparse warning seen on some distros. Fixes: 91dc5d2553fbf ("ibmvnic: fix miscellaneous checks") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	ibmvnic: free tx_pool if tso_pool alloc fails	Sukadev Bhattiprolu
	Free tx_pool and clear it, if allocation of tso_pool fails. release_tx_pools() assumes we have both tx and tso_pools if ->tx_pool is non-NULL. If allocation of tso_pool fails in init_tx_pools(), the assumption will not be true and we would end up dereferencing ->tx_buff, ->free_map fields from a NULL pointer. Fixes: 3205306c6b8d ("ibmvnic: Update TX pool initialization routine") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	ibmvnic: set ltb->buff to NULL after freeing	Sukadev Bhattiprolu
	free_long_term_buff() checks ltb->buff to decide whether we have a long term buffer to free. So set ltb->buff to NULL afer freeing. While here, also clear ->map_id, fix up some coding style and log an error. Fixes: 9c4eaabd1bb39 ("Check CRQ command return codes") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	ibmvnic: account for bufs already saved in indir_buf	Sukadev Bhattiprolu
	This fixes a crash in replenish_rx_pool() when called from ibmvnic_poll() after a previous call to replenish_rx_pool() encountered an error when allocating a socket buffer. Thanks to Rick Lindsley and Dany Madden for helping debug the crash. Fixes: 4f0b6812e9b9 ("ibmvnic: Introduce batched RX buffer descriptor transmission") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>