linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-06-24	PCI: imx6: Remove imx6_pcie_probe() redundant error message	Zhen Lei
	When devm_ioremap_resource() fails, __devm_ioremap_resource() prints an error message including the device name, failure cause, and possibly resource information. Remove the error message from imx6_pcie_probe() since it's redundant. Link: https://lore.kernel.org/r/20210511114547.5601-1-thunder.leizhen@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kw@linux.com> Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
2021-06-24	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-06-24 This series contains updates to i40e driver only. Dinghao Liu corrects error handling for failed i40e_vsi_request_irq() call. Mateusz allows for disabling of autonegotiation for all BaseT media. Jesse corrects the multiplier being used on 5Gb speeds for PTP. Jan adds locking in paths calling i40e_setup_pf_switch() that were missing it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	i2c: dev: Add __user annotation	Andreas Hecht
	Fix Sparse warnings: drivers/i2c/i2c-dev.c:546:19: warning: incorrect type in assignment (different address spaces) drivers/i2c/i2c-dev.c:549:53: warning: incorrect type in argument 2 (different address spaces) compat_ptr() returns a pointer tagged __user which gets assigned to a pointer missing the __user annotation. The same pointer is passed to copy_from_user() as an argument where it is expected to have the __user annotation. Fix both by adding the __user annotation to the pointer. Fixes: 7d5cb45655f2 ("i2c compat ioctls: move to ->compat_ioctl()") Signed-off-by: Andreas Hecht <andreas.e.hecht@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-06-24	Merge branch 'gve-dqo'	David S. Miller
	Bailey Forrest says: ==================== gve: Introduce DQO descriptor format DQO is the descriptor format for our next generation virtual NIC. The existing descriptor format will be referred to as "GQI" in the patch set. One major change with DQO is it uses dual descriptor rings for both TX and RX queues. The TX path uses a TX queue to send descriptors to HW, and receives packet completion events on a TX completion queue. The RX path posts buffers to HW using an RX buffer queue and receives incoming packets on an RX queue. One important note is that DQO descriptors and doorbells are little endian. We continue to use the existing big endian control plane infrastructure. The general format of the patch series is: - Refactor existing code/data structures to be shared by DQO - Expand admin queues to support DQO device setup - Expand data structures and device setup to support DQO - Add logic to setup DQO queues - Implement datapath ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add RX path	Bailey Forrest
	The RX queue has an array of `gve_rx_buf_state_dqo` objects. All allocated pages have an associated buf_state object. When a buffer is posted on the RX buffer queue, the buffer ID will be the buf_state's index into the RX queue's array. On packet reception, the RX queue will have one descriptor for each buffer associated with a received packet. Each RX descriptor will have a buffer_id that was posted on the buffer queue. Notable mentions: - We use a default buffer size of 2048 bytes. Based on page size, we may post separate sections of a single page as separate buffers. - The driver holds an extra reference on pages passed up the receive path with an skb and keeps these pages on a list. When posting new buffers to the NIC, we check if any of these pages has only our reference, or another buffer sized segment of the page has no references. If so, it is free to reuse. This page recycling approach is a common netdev optimization that reduces page alloc/free calls. - Pages in the free list have a page_count bias in order to avoid an atomic increment of pagecount every time we attempt to reuse a page. # references = page_count() - bias - In order to track when a page is safe to reuse, we keep track of the last offset which had a single SKB reference. When this occurs, it implies that every single other offset is reusable. Otherwise, we don't know if offsets can be safely reused. - We maintain two free lists of pages. List #1 (recycled_buf_states) contains pages we know can be reused right away. List #2 (used_buf_states) contains pages which cannot be used right away. We only attempt to get pages from list #2 when list #1 is empty. We only attempt to use a small fixed number pages from list #2 before giving up and allocating a new page. Both lists are FIFOs in hope that by the time we attempt to reuse a page, the references were dropped. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add TX path	Bailey Forrest
	TX SKBs will have their buffers DMA mapped with the device. Each buffer will have at least one TX descriptor associated. Each SKB will also have a metadata descriptor. Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects. Every TX SKB will have an associated pending_packet object. A TX SKB's descriptors will use its pending_packet's index as the completion tag, which will be returned on the TX completion queue. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. Notable mentions: - Buffers may be freed after receiving the miss-completion, but in order to avoid packet reordering, we do not complete the SKB until receiving the reinjection completion. - The driver must robustly handle the unlikely scenario where a miss completion does not have an associated reinjection completion. This is accomplished by maintaining a list of packets which have a pending reinjection completion. After a short timeout (5 seconds), the SKB and buffers are released and the pending_packet is moved to a second list which has a longer timeout (60 seconds), where the pending_packet will not be reused. When the longer timeout elapses, the driver may assume the reinjection completion would never be received and the pending_packet may be reused. - Completion handling is triggered by an interrupt and is done in the NAPI poll function. Because the TX path and completion exist in different threading contexts they maintain their own lists for free pending_packet objects. The TX path uses a lock-free approach to steal the list from the completion path. - Both the TSO context and general context descriptors have metadata bytes. The device requires that if multiple descriptors contain the same field, each descriptor must have the same value set for that field. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Configure interrupts on device up	Bailey Forrest
	When interrupts are first enabled, we also set the ratelimits, which will be static for the entire usage of the device. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add ring allocation and initialization	Bailey Forrest
	Allocate the buffer and completion ring structures. Do not populate the rings yet. That will happen in the respective rx and tx datapath follow-on patches Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add core netdev features	Bailey Forrest
	Add napi netdev device registration, interrupt handling and initial tx and rx polling stubs. The stubs will be filled in follow-on patches. Also: - LRO feature advertisement and handling - Also update ethtool logic Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Update adminq commands to support DQO queues	Bailey Forrest
	DQO queue creation requires additional parameters: - TX completion/RX buffer queue size - TX completion/RX buffer queue address - TX/RX queue size - RX buffer size Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add DQO fields for core data structures	Bailey Forrest
	- Add new DQO datapath structures: - `gve_rx_buf_queue_dqo` - `gve_rx_compl_queue_dqo` - `gve_rx_buf_state_dqo` - `gve_tx_desc_dqo` - `gve_tx_pending_packet_dqo` - Incorporate these into the existing ring data structures: - `gve_rx_ring` - `gve_tx_ring` Noteworthy mentions: - `gve_rx_buf_state` represents an RX buffer which was posted to HW. Each RX queue has an array of these objects and the index into the array is used as the buffer_id when posted to HW. - `gve_tx_pending_packet_dqo` is treated similarly for TX queues. The completion_tag is the index into the array. - These two structures have links for linked lists which are represented by 16b indexes into a contiguous array of these structures. This reduces memory footprint compared to 64b pointers. - We use unions for the writeable datapath structures to reduce cache footprint. GQI specific members will renamed like DQO members in a future patch. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add dqo descriptors	Bailey Forrest
	General description of rings and descriptors: TX ring is used for sending TX packet buffers to the NIC. It has the following descriptors: - `gve_tx_pkt_desc_dqo` - Data buffer descriptor - `gve_tx_tso_context_desc_dqo` - TSO context descriptor - `gve_tx_general_context_desc_dqo` - Generic metadata descriptor Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo` which represents the logical interpetation of the metadata bytes. It's helpful to define this structure because the metadata bytes exist in multiple descriptor types (including `gve_tx_tso_context_desc_dqo`), and the device requires same field has the same value in all descriptors. The TX completion ring is used to receive completions from the NIC. Having a separate ring allows for completions to be out of order. The completion descriptor `gve_tx_compl_desc` has several different types, most important are packet and descriptor completions. Descriptor completions are used to notify the driver when descriptors sent on the TX ring are done being consumed. The descriptor completion is only used to signal that space is cleared in the TX ring. A packet completion will be received when a packet transmitted on the TX queue is done being transmitted. In addition there are "miss" and "reinjection" completions. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. The RX buffer ring is used to send buffers to HW via the `gve_rx_desc_dqo` descriptor. Received packets are put into the RX queue by the device, which populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors refer to buffers posted by the buffer queue. Received buffers may be returned out of order, such as when HW LRO is enabled. Important concepts: - "TX" and "RX buffer" queues, which send descriptors to the device, use MMIO doorbells to notify the device of new descriptors. - "RX" and "TX completion" queues, which receive descriptors from the device, use a "generation bit" to know when a descriptor was populated by the device. The driver initializes all bits with the "current generation". The device will populate received descriptors with the "next generation" which is inverted from the current generation. When the ring wraps, the current/next generation are swapped. - It's the driver's responsibility to ensure that the RX and TX completion queues are not overrun. This can be accomplished by limiting the number of descriptors posted to HW. - TX packets have a 16 bit completion_tag and RX buffers have a 16 bit buffer_id. These will be returned on the TX completion and RX queues respectively to let the driver know which packet/buffer was completed. Bitfields are used to describe descriptor fields. This notation is more concise and readable than shift-and-mask. It is possible because the driver is restricted to little endian platforms. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add support for DQO RX PTYPE map	Bailey Forrest
	Unlike GQI, DQO RX descriptors do not contain the L3 and L4 type of the packet. L3 and L4 types are necessary in order to set the hash and csum on RX SKBs correctly. DQO RX descriptors instead contain a 10 bit PTYPE index. The PTYPE map enables the device to tell the driver how to map from PTYPE index to L3/L4 type. The device doesn't provide any guarantees about the range of possible PTYPEs, so we just use a 1024 entry array to implement a fast mapping structure. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: adminq: DQO specific device descriptor logic	Bailey Forrest
	- In addition to TX and RX queues, DQO has TX completion and RX buffer queues. - TX completions are received when the device has completed sending a packet on the wire. - RX buffers are posted on a separate queue form the RX completions. - DQO descriptor rings are allowed to be smaller than PAGE_SIZE. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Introduce per netdev `enum gve_queue_format`	Bailey Forrest
	The currently supported queue formats are: - GQI_RDA - GQI with raw DMA addressing - GQI_QPL - GQI with queue page list - DQO_RDA - DQO with raw DMA addressing The old `gve_priv.raw_addressing` value is only used for GQI_RDA, so we remove it in favor of just checking against GQI_RDA Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Introduce a new model for device options	Bailey Forrest
	The current model uses an integer ID and a fixed size struct for the parameters of each device option. The new model allows the device option structs to grow in size over time. A driver may assume that changes to device option structs will always be appended. New device options will also generally have a `supported_features_mask` so that the driver knows which fields within a particular device option are enabled. `gve_device_option.feat_mask` is changed to `required_features_mask`, and it is a bitmask which must match the value expected by the driver. This gives the device the ability to break backwards compatibility with old drivers for certain features by blocking the old drivers from trying to use the feature. We maintain ABI compatibility with the old model for GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which does not support the new model. This patch introduces some new terminology: RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA mapped and read/updated by the device. QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with the device for read/write and data is copied from/to SKBs. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Make gve_rx_slot_page_info.page_offset an absolute offset	Bailey Forrest
	Using `page_offset` like a boolean means a page may only be split into two sections. With page sizes larger than 4k, this can be very wasteful. Future commits in this patchset use `struct gve_rx_slot_page_info` in a way which supports a fixed buffer size and a variable page size. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: gve_rx_copy: Move padding to an argument	Bailey Forrest
	Future use cases will have a different padding value. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Move some static functions to a common file	Bailey Forrest
	These functions will be shared by the GQI and DQO variants of the GVNIC driver as of follow-up patches in this series. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Update GVE documentation to describe DQO	Bailey Forrest
	DQO is a new descriptor format for our next generation virtual NIC. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	dt-bindings: interrupt-controller: Convert ARM VIC to json-schema	Sudeep Holla
	Convert the ARM VIC binding document to DT schema format using json-schema. Cc: Rob Herring <robh@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20210617205317.3060163-1-sudeep.holla@arm.com Signed-off-by: Rob Herring <robh@kernel.org>
2021-06-24	of: of_reserved_mem: mark nomap memory instead of removing	Dong Aisheng
	Since commit 86588296acbf ("fdt: Properly handle "no-map" field in the memory region"), nomap memory is changed to call memblock_mark_nomap() instead of memblock_remove(). But it only changed the reserved memory with fixed addr and size case in early_init_dt_reserve_memory_arch(), not including the dynamical allocation by size case in early_init_dt_alloc_reserved_memory_arch(). Cc: Rob Herring <robh+dt@kernel.org> Cc: devicetree@vger.kernel.org Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20210611131153.3731147-2-aisheng.dong@nxp.com Signed-off-by: Rob Herring <robh@kernel.org>
2021-06-24	of: of_reserved_mem: only call memblock_free for normal reserved memory	Dong Aisheng
	For nomap case, the memory block will be removed by memblock_remove() in early_init_dt_alloc_reserved_memory_arch(). So it's meaningless to call memblock_free() on error path. Cc: Rob Herring <robh+dt@kernel.org> Cc: devicetree@vger.kernel.org Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Link: https://lore.kernel.org/r/20210611131153.3731147-1-aisheng.dong@nxp.com Signed-off-by: Rob Herring <robh@kernel.org>
2021-06-24	PCI: intel-gw: Fix INTx enable	Martin Blumenstingl
	The legacy PCI interrupt lines need to be enabled using PCIE_APP_IRNEN bits 13 (INTA), 14 (INTB), 15 (INTC) and 16 (INTD). The old code however was taking (for example) "13" as raw value instead of taking BIT(13). Define the legacy PCI interrupt bits using the BIT() macro and then use these in PCIE_APP_IRN_INT. Link: https://lore.kernel.org/r/20210106135540.48420-1-martin.blumenstingl@googlemail.com Fixes: ed22aaaede44 ("PCI: dwc: intel: PCIe RC controller driver") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rahul Tanwar <rtanwar@maxlinear.com>
2021-06-24	ipv6: fix out-of-bound access in ip6_parse_tlv()	Eric Dumazet
	First problem is that optlen is fetched without checking there is more than one byte to parse. Fix this by taking care of IPV6_TLV_PAD1 before fetching optlen (under appropriate sanity checks against len) Second problem is that IPV6_TLV_PADN checks of zero padding are performed before the check of remaining length. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: c1412fce7ecc ("net/ipv6/exthdrs.c: Strict PadN option checking") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge branch 'macsec-key-length'	David S. Miller
	Antoine Tenart says: ==================== net: macsec: fix key length when offloading The key length used to copy the key to offloading drivers and to store it is wrong and was working by chance as it matched the default key length. But using a different key length fails. Fix it by using instead the max length accepted in uAPI to store the key and the actual key length when copying it. This was tested on the MSCC PHY driver but not on the Atlantic MAC (looking at the code it looks ok, but testing would be appreciated). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: atlantic: fix the macsec key length	Antoine Tenart
	The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 27736563ce32 ("net: atlantic: MACSec egress offload implementation") Fixes: 9ff40a751a6f ("net: atlantic: MACSec ingress offload implementation") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: phy: mscc: fix macsec key length	Antoine Tenart
	The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 28c5107aa904 ("net: phy: mscc: macsec support") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: macsec: fix the length used to copy the key for offloading	Antoine Tenart
	The key length used when offloading macsec to Ethernet or PHY drivers was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length accepted in uAPI) and secy->key_len to copy it. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	trace/hwlat: Switch disable_migrate to mode none	Daniel Bristot de Oliveira
	When in the round-robin mode, if the tracer detects a change in the hwlatd thread affinity by an external tool, e.g., taskset, the round-robin logic is disabled. The disable_migrate variable currently tracks this. With the addition of the "mode" config and the mode "none," the disable_migrate logic is equivalent to switch to the "none" mode. Hence, instead of using a hidden variable to track this behavior, switch the mode to none, informing the user about this change. Link: https://lkml.kernel.org/r/a679af672458d6b1f62252605905c5214030f247.1624372313.git.bristot@redhat.com Cc: Phil Auld <pauld@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Clark Willaims <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-24	trace/hwlat: Implement the mode config option	Daniel Bristot de Oliveira
	Provides the "mode" config to the hardware latency detector. hwlatd has two different operation modes. The default mode is the "round-robin" one, in which a single hwlatd thread runs, migrating among the allowed CPUs in a "round-robin" fashion. This is the current behavior. The "none" sets the allowed cpumask for a single hwlatd thread at the startup, but skips the round-robin, letting the scheduler handle the migration. In preparation to the per-cpu mode. Link: https://lkml.kernel.org/r/f3b1271262aa030c680e26615c1b9b2d71e55e92.1624372313.git.bristot@redhat.com Cc: Phil Auld <pauld@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Clark Willaims <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-24	trace/hwlat: Fix Clark's email	Daniel Bristot de Oliveira
	Clark's email is williams@redhat.com. No functional change. Link: https://lkml.kernel.org/r/6fa4b49e17ab8a1ff19c335ab7cde38d8afb0e29.1624372313.git.bristot@redhat.com Cc: Phil Auld <pauld@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Clark Willaims <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-24	usbnet: add usbnet_event_names[] for kevent	Yajun Deng
	Modify the netdev_dbg content from int to char * in usbnet_defer_kevent(), this looks more readable. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	bootconfig/tracing/ktest: Add ktest examples of testing bootconfig	Steven Rostedt (VMware)
	bootconfig is a new feature that appends scripts onto the initrd, and the kernel executes the scripts as an extended kernel command line. Need to add tests to test that the happened. To test the bootconfig properly, the initrd needs to be updated and the kernel rebooted. ktest is the perfect solution to perform these tests. Add a example bootconfig.conf in the tools/testing/ktest/examples/include and example bootconfig scripts in tools/testing/ktest/examples/bootconfig and also include verifier scripts that ktest will install on the target and run to make sure that the bootconfig options in the scripts took place after the target rebooted with the new initrd update. Link: https://lkml.kernel.org/r/20210618112647.6a81dec5@oasis.local.home Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-24	vfio: use the new pci_dev_trylock() helper to simplify try lock	Luis Chamberlain
	Use the new pci_dev_trylock() helper to simplify our locking. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20210623022824.308041-3-mcgrof@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-24	PCI: Export pci_dev_trylock() and pci_dev_unlock()	Luis Chamberlain
	Other places in the kernel use this form, and so just provide a common path for it. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20210623022824.308041-2-mcgrof@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-24	vfio/mdpy: Fix memory leak of object mdev_state->vconfig	Colin Ian King
	In the case where the call to vfio_register_group_dev fails the error return path kfree's mdev_state but not mdev_state->vconfig. Fix this by kfree'ing mdev_state->vconfig before returning. Addresses-Coverity: ("Resource leak") Fixes: 437e41368c01 ("vfio/mdpy: Convert to use vfio_register_group_dev()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210622183710.28954-1-colin.king@canonical.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-24	libceph: set global_id as soon as we get an auth ticket	Ilya Dryomov
	Commit 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") delayed the setting of global_id too much. It is set only after all tickets are received, but in pre-nautilus clusters an auth ticket and the service tickets are obtained in separate steps (for a total of three MAuth replies). When the service tickets are requested, global_id is used to build an authorizer; if global_id is still 0 we never get them and fail to establish the session. Moving the setting of global_id into protocol implementations. This way global_id can be set exactly when an auth ticket is received, not sooner nor later. Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24	libceph: don't pass result into ac->ops->handle_reply()	Ilya Dryomov
	There is no result to pass in msgr2 case because authentication failures are reported through auth_bad_method frame and in MAuth case an error is returned immediately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24	perf top: Add cgroup support for perf top (-G)	Joshua Martinez
	Added callback option (-G) to support cgroups for 'perf top'. Added condition to make sure -cgroup and --all-cgroups aren't both enabled. Example: $perf top -e cycles -G system.slice/docker-6b95a5eb649c0d671eba3835f0d93973d05a088f3ae8602246bde37affb1ba3e.scope -a --stdio PerfTop: 3330 irqs/sec kernel:68.2% exact: 0.0% lost: 0/0 drop: 0/11075 [4000Hz cpu-clock], (all, 4 CPUs) ------------------------------------------------------------------------------------------------------------------------------------------------------- 27.32% [unknown] [.] 0x00007f8ab7b69352 11.44% [kernel] [k] 0xffffffff968cd657 3.12% [kernel] [k] 0xffffffff96160e96 2.63% [kernel] [k] 0xffffffff96160eb0 1.96% [kernel] [k] 0xffffffff9615fcf6 1.42% [kernel] [k] 0xffffffff964ddfc7 1.09% [kernel] [k] 0xffffffff96160e90 0.81% [kernel] [k] 0xffffffff96160eb3 0.67% [kernel] [k] 0xffffffff9615fec1 0.57% [kernel] [k] 0xffffffff961ee1d0 0.53% [unknown] [.] 0x00007f8ab7b6666c 0.53% [kernel] [k] 0xffffffff96160e64 0.52% [kernel] [k] 0xffffffff9616c303 0.51% [kernel] [k] 0xffffffffc08e7d50 ... Signed-off-by: Joshua Martinez <joshuamart@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: joshua martinez <joshuamart@google.com> Link: http://lore.kernel.org/lkml/20210616231829.3735671-1-joshuamart@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-24	spi: Fix self assignment issue with ancillary->mode	Colin Ian King
	There is an assignment of ancillary->mode to itself which looks dubious since the proceeding comment states that the speed and mode is taken over from the SPI main device, indicating that ancillary->mode should assigned using the value spi->mode. Fix this. Addresses-Coverity: ("Self assignment") Fixes: 0c79378c0199 ("spi: add ancillary device support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210623172300.161484-1-colin.king@canonical.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-24	RDMA/cma: Protect RMW with qp_mutex	Håkon Bugge
	The struct rdma_id_private contains three bit-fields, tos_set, timeout_set, and min_rnr_timer_set. These are set by accessor functions without any synchronization. If two or all accessor functions are invoked in close proximity in time, there will be Read-Modify-Write from several contexts to the same variable, and the result will be intermittent. Fixed by protecting the bit-fields by the qp_mutex in the accessor functions. The consumer of timeout_set and min_rnr_timer_set is in rdma_init_qp_attr(), which is called with qp_mutex held for connected QPs. Explicit locking is added for the consumers of tos and tos_set. This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT transition"), since the call to rdma_init_qp_attr() from cma_init_conn_qp() does not hold the qp_mutex. Fixes: 2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set") Fixes: 3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()") Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-24	ASoC: qcom: lpass-cpu: mark IRQ_CLEAR register as volatile and readable	Srinivas Kandagatla
	Currently IRQ_CLEAR register is marked as write-only, however using regmap_update_bits on this register will have some side effects. so mark IRQ_CLEAR register appropriately as readable and volatile. Fixes: da0363f7bfd3 ("ASoC: qcom: Fix for DMA interrupt clear reg overwriting") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Srinivasa Rao Mandadapu <srivasam@codeaurora.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210624092153.5771-1-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-24	RDMA/cma: Remove unnecessary INIT->INIT transition	Håkon Bugge
	In rdma_create_qp(), a connected QP will be transitioned to the INIT state. Afterwards, the QP will be transitioned to the RTR state by the cma_modify_qp_rtr() function. But this function starts by performing an ib_modify_qp() to the INIT state again, before another ib_modify_qp() is performed to transition the QP to the RTR state. Hence, there is no need to transition the QP to the INIT state in rdma_create_qp(). Link: https://lore.kernel.org/r/1624369197-24578-2-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-24	Merge branch 'add-sparx5i-driver'	David S. Miller
	Steen Hegelund says: ==================== Adding the Sparx5i Switch Driver This series provides the Microchip Sparx5i Switch Driver The SparX-5 Enterprise Ethernet switch family provides a rich set of Enterprise switching features such as advanced TCAM-based VLAN and QoS processing enabling delivery of differentiated services, and security through TCAMbased frame processing using versatile content aware processor (VCAP). IPv4/IPv6 Layer 3 (L3) unicast and multicast routing is supported with up to 18K IPv4/9K IPv6 unicast LPM entries and up to 9K IPv4/3K IPv6 (S,G) multicast groups. L3 security features include source guard and reverse path forwarding (uRPF) tasks. Additional L3 features include VRF-Lite and IP tunnels (IP over GRE/IP). The SparX-5 switch family features a highly flexible set of Ethernet ports with support for 10G and 25G aggregation links, QSGMII, USGMII, and USXGMII. The device integrates a powerful 1 GHz dual-core ARM® Cortex®-A53 CPU enabling full management of the switch and advanced Enterprise applications. The SparX-5 switch family targets managed Layer 2 and Layer 3 equipment in SMB, SME, and Enterprise where high port count 1G/2.5G/5G/10G switching with 10G/25G aggregation links is required. The SparX-5 switch family consists of following SKUs: VSC7546 SparX-5-64 supports up to 64 Gbps of bandwidth with the following primary port configurations. - 6 ×10G - 16 × 2.5G + 2 × 10G - 24 × 1G + 4 × 10G VSC7549 SparX-5-90 supports up to 90 Gbps of bandwidth with the following primary port configurations. - 9 × 10G - 16 × 2.5G + 4 × 10G - 48 × 1G + 4 × 10G VSC7552 SparX-5-128 supports up to 128 Gbps of bandwidth with the following primary port configurations. - 12 × 10G - 6 x 10G + 2 x 25G - 16 × 2.5G + 8 × 10G - 48 × 1G + 8 × 10G VSC7556 SparX-5-160 supports up to 160 Gbps of bandwidth with the following primary port configurations. - 16 × 10G - 10 × 10G + 2 × 25G - 16 × 2.5G + 10 × 10G - 48 × 1G + 10 × 10G VSC7558 SparX-5-200 supports up to 200 Gbps of bandwidth with the following primary port configurations. - 20 × 10G - 8 × 25G In addition, the device supports one 10/100/1000/2500/5000 Mbps SGMII/SerDes node processor interface (NPI) Ethernet port. Time sensitive networking (TSN) is supported through a comprehensive set of features including frame preemption, cut-through, frame replication and elimination for reliability, enhanced scheduling: credit-based shaping, time-aware shaping, cyclic queuing, and forwarding, and per-stream policing and filtering. Together with IEEE 1588 and IEEE 802.1AS support, this guarantees low-latency deterministic networking for Industrial Ethernet. The Sparx5i support is developed on the PCB134 and PCB135 evaluation boards. - PCB134 main networking features: - 12x SFP+ front 10G module slots (connected to Sparx5i through SFI). - 8x SFP28 front 25G module slots (connected to Sparx5i through SFI high speed). - Optional, one additional 10/100/1000BASE-T (RJ45) Ethernet port (on-board VSC8211 PHY connected to Sparx5i through SGMII). - PCB135 main networking features: - 48x1G (10/100/1000M) RJ45 front ports using 12xVSC8514 QuadPHY’s each connected to VSC7558 through QSGMII. - 4x10G (1G/2.5G/5G/10G) RJ45 front ports using the AQR407 10G QuadPHY each port connects to VSC7558 through SFI. - 4x SFP28 25G module slots on back connected to VSC7558 through SFI high speed. - Optional, one additional 1G (10/100/1000M) RJ45 port using an on-board VSC8211 PHY, which can be connected to VSC7558 NPI port through SGMII using a loopback add-on PCB) This series provides support for: - SFPs and DAC cables via PHYLINK with a number of 5G, 10G and 25G devices and media types. - Port module configuration for 10M to 25G speeds with SGMII, QSGMII, 1000BASEX, 2500BASEX and 10GBASER as appropriate for these modes. - SerDes configuration via the Sparx5i SerDes driver (see below). - Host mode providing register based injection and extraction. - Switch mode providing MAC/VLAN table learning and Layer2 switching offloaded to the Sparx5i switch. - STP state, VLAN support, host/bridge port mode, Forwarding DB, and configuration and statistics via ethtool. More support will be added at a later stage. The Sparx5i Chip Register Model can be browsed at this location: https://github.com/microchip-ung/sparx-5_reginfo and the datasheet is available here: https://ww1.microchip.com/downloads/en/DeviceDoc/SparX-5_Family_L2L3_Enterprise_10G_Ethernet_Switches_Datasheet_00003822B.pdf The series depends on the following series currently on their way into the kernel: - 25G Base-R phy mode Link: https://lore.kernel.org/r/20210611125453.313308-1-steen.hegelund@microchip.com/ - Sparx5 Reset Driver Link: https://lore.kernel.org/r/20210416084054.2922327-1-steen.hegelund@microchip.com/ ChangeLog: v5: - cover letter - updated the description to match the latest data sheets - basic driver - added error message in case of reset controller error - port struct: replacing has_sfp with inband, adding pause_adv - host mode - port cleanup: unregisters netdevs and then removes phylink etc - checking for pause_adv when comparing port config changes - getting duplex and pause state in the link_up callback. - getting inband, autoneg and pause_adv config in the pcs_config callback. - port - use only the pause_adv bits when getting aneg status - use the inband state when updating the PCS and port config v4: - basic driver: Using devm_reset_control_get_optional_shared to get the reset control, and let the reset framework check if it is valid. - host mode (phylink): Use the PCS operations to get state and update configuration. Removed the setting of interface modes. Let phylink control this. Using the new 5gbase-r and 25gbase-r modes. Using a helper function to check if one of the 3 base-r modes has been selected. Currently it will not be possible to change the interface mode by changing the speed (e.g via ethtool). This will be added later. v3: - basic driver: - removed unneeded braces - release reference to ports node after use - use dev_err_probe to handle DEFER - update error value when bailing out (a few cases) - updated formatting of port struct and grouping of bool values - simplified the spx5_rmw and spx5_inst_rmw inline functions - host mode (netdev): - removed lockless flag - added port timer init - host mode (packet - manual injection): - updated error counters in error situations - implemented timer handling of watermark threshold: stop and restart netif queues. - fixed error message handling (rate limited) - fixed comment style error - used DIV_ROUND_UP macro - removed a debug message for open ports v2: - Updated bindings: - drop minItems for the reg property - Statistics implementation: - Reorganized statistics into ethtool groups: eth-phy, eth-mac, eth-ctrl, rmon as defined by the IEEE 802.3 categories and RFC 2819. - The remaining statistics are provided by the classic ethtool statistics command. - Hostmode support: - Removed netdev renaming - Validate ethernet address in sparx5_set_mac_address() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	arm64: dts: sparx5: Add the Sparx5 switch node	Steen Hegelund
	This provides the configuration for the currently available evaluation boards PCB134 and PCB135. The series depends on the following series currently on its way into the kernel: - Sparx5 Reset Driver Link: https://lore.kernel.org/r/20210416084054.2922327-1-steen.hegelund@microchip.com/ Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add ethtool configuration and statistics support	Steen Hegelund
	This adds statistic counters for the network interfaces provided by the driver. It also adds CPU port counters (which are not exposed by ethtool). This also adds support for configuring the network interface parameters via ethtool: speed, duplex, aneg etc. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add calendar bandwidth allocation support	Steen Hegelund
	This configures the Sparx5 calendars according to the bandwidth requested in the Device Tree nodes. It also checks if the total requested bandwidth is within the specs of the detected Sparx5 models limits. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add switching support	Steen Hegelund
	This adds SwitchDev support by hardware offloading the software bridge. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add vlan support	Steen Hegelund
	This adds Sparx5 VLAN support. Sparx5 has more VLAN features than provided here, but these will be added in later series. For now we only add the basic L2 features. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>