summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-10-13ipv6: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in ndisc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13llc/snap: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in LLC and SNAP constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13rose: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in rose constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ax25: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in AX25 constant. Modify callers as well (netrom, rose). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13SUNRPC: Change return value type of .pc_encodeChuck Lever
Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length. Document there are only two valid return values by having .pc_encode return only true or false. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13SUNRPC: Replace the "__be32 *p" parameter to .pc_encodeChuck Lever
The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR encoder, and can be removed. Note also that there is a line in each encoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per encoder function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13SUNRPC: Change return value type of .pc_decodeChuck Lever
Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length. Document there are only two valid return values by having .pc_decode return only true or false. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13SUNRPC: Replace the "__be32 *p" parameter to .pc_decodeChuck Lever
The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR decoder, and can be removed. Note also that there is a line in each decoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per decoder function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13nvmem: core: add nvmem cell post processing callbackSrinivas Kandagatla
Some NVMEM providers have certain nvmem cells encoded, which requires post processing before actually using it. For example mac-address is stored in either in ascii or delimited or reverse-order. Having a post-process callback hook to provider drivers would enable them to do this vendor specific post processing before nvmem consumers see it. Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20211013131957.30271-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-12scsi: libsas: Export sas_phy_enable()Luo Jiaxing
Export sas_phy_enable() so LLDDs can directly use it to control remote phys. We already do this for companion function sas_phy_reset(). Link: https://lore.kernel.org/r/1634041588-74824-4-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch libVladimir Oltean
Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driverVladimir Oltean
As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping (the command: one-step/two-step, and the TX timestamp identifier). None of that requires deep insight into the driver, it is quite stateless, as it only depends upon the skb->cb. So let's make it a static inline function and put it in include/linux/dsa/ocelot.h, a file that despite its name is used by the ocelot switch driver for populating the injection header too - since commit 40d3f295b5fe ("net: mscc: ocelot: use common tag parsing code with DSA"). With that function declared as static inline, its body is expanded inside each call site, so the dependency is broken and the DSA tagger can be built without the switch library, upon which the felix driver depends. Fixes: 39e5308b3250 ("net: mscc: ocelot: support PTP Sync one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with ↵Vladimir Oltean
the skb PTP header The sad reality is that when a PTP frame with a TX timestamping request is transmitted, it isn't guaranteed that it will make it all the way to the wire (due to congestion inside the switch), and that a timestamp will be taken by the hardware and placed in the timestamp FIFO where an IRQ will be raised for it. The implication is that if enough PTP frames are silently dropped by the hardware such that the timestamp ID has rolled over, it is possible to match a timestamp to an old skb. Furthermore, nobody will match on the real skb corresponding to this timestamp, since we stupidly matched on a previous one that was stale in the queue, and stopped there. So PTP timestamping will be broken and there will be no way to recover. It looks like the hardware parses the sequenceID from the PTP header, and also provides that metadata for each timestamp. The driver currently ignores this, but it shouldn't. As an extra resiliency measure, do the following: - check whether the PTP sequenceID also matches between the skb and the timestamp, treat the skb as stale otherwise and free it - if we see a stale skb, don't stop there and try to match an skb one more time, chances are there's one more skb in the queue with the same timestamp ID, otherwise we wouldn't have ever found the stale one (it is by timestamp ID that we matched it). While this does not prevent PTP packet drops, it at least prevents the catastrophic consequences of incorrect timestamp matching. Since we already call ptp_classify_raw in the TX path, save the result in the skb->cb of the clone, and just use that result in the interrupt code path. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: avoid overflowing the PTP timestamp FIFOVladimir Oltean
PTP packets with 2-step TX timestamp requests are matched to packets based on the egress port number and a 6-bit timestamp identifier. All PTP timestamps are held in a common FIFO that is 128 entry deep. This patch ensures that back-to-back timestamping requests cannot exceed the hardware FIFO capacity. If that happens, simply send the packets without requesting a TX timestamp to be taken (in the case of felix, since the DSA API has a void return code in ds->ops->port_txtstamp) or drop them (in the case of ocelot). I've moved the ts_id_lock from a per-port basis to a per-switch basis, because we need separate accounting for both numbers of PTP frames in flight. And since we need locking to inc/dec the per-switch counter, that also offers protection for the per-port counter and hence there is no reason to have a per-port counter anymore. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: make use of all 63 PTP timestamp identifiersVladimir Oltean
At present, there is a problem when user space bombards a port with PTP event frames which have TX timestamping requests (or when a tc-taprio offload is installed on a port, which delays the TX timestamps by a significant amount of time). The driver will happily roll over the 2-bit timestamp ID and this will cause incorrect matches between an skb and the TX timestamp collected from the FIFO. The Ocelot switches have a 6-bit PTP timestamp identifier, and the value 63 is reserved, so that leaves identifiers 0-62 to be used. The timestamp identifiers are selected by the REW_OP packet field, and are actually shared between CPU-injected frames and frames which match a VCAP IS2 rule that modifies the REW_OP. The hardware supports partitioning between the two uses of the REW_OP field through the PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2. The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping, and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in ocelot_init_timestamp() which makes all timestamp identifiers available to CPU injection. Therefore, we can make use of all 63 timestamp identifiers, which should allow more timestampable packets to be in flight on each port. This is only part of the solution, more issues will be addressed in future changes. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch ↵Vladimir Oltean
driver It's nice to be able to test a tagging protocol with dsa_loop, but not at the cost of losing the ability of building the tagging protocol and switch driver as modules, because as things stand, there is a circular dependency between the two. Tagging protocol drivers cannot depend on switch drivers, that is a hard fact. The reasoning behind the blamed patch was that accessing dp->priv should first make sure that the structure behind that pointer is what we really think it is. Currently the "sja1105" and "sja1110" tagging protocols only operate with the sja1105 switch driver, just like any other tagging protocol and switch combination. The only way to mix and match them is by modifying the code, and this applies to dsa_loop as well (by default that uses DSA_TAG_PROTO_NONE). So while in principle there is an issue, in practice there isn't one. Until we extend dsa_loop to allow user space configuration, treat the problem as a non-issue and just say that DSA ports found by tag_sja1105 are always sja1105 ports, which is in fact true. But keep the dsa_port_is_sja1105 function so that it's easy to patch it during testing, and rely on dead code elimination. Fixes: 994d2cbb08ca ("net: dsa: tag_sja1105: be dsa_loop-safe") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driverVladimir Oltean
The problem is that DSA tagging protocols really must not depend on the switch driver, because this creates a circular dependency at insmod time, and the switch driver will effectively not load when the tagging protocol driver is missing. The code was structured in the way it was for a reason, though. The DSA driver-facing API for PTP timestamping relies on the assumption that two-step TX timestamps are provided by the hardware in an out-of-band manner, typically by raising an interrupt and making that timestamp available inside some sort of FIFO which is to be accessed over SPI/MDIO/etc. So the API puts .port_txtstamp into dsa_switch_ops, because it is expected that the switch driver needs to save some state (like put the skb into a queue until its TX timestamp arrives). On SJA1110, TX timestamps are provided by the switch as Ethernet packets, so this makes them be received and processed by the tagging protocol driver. This in itself is great, because the timestamps are full 64-bit and do not require reconstruction, and since Ethernet is the fastest I/O method available to/from the switch, PTP timestamps arrive very quickly, no matter how bottlenecked the SPI connection is, because SPI interaction is not needed at all. DSA's code structure and strict isolation between the tagging protocol driver and the switch driver break the natural code organization. When the tagging protocol driver receives a packet which is classified as a metadata packet containing timestamps, it passes those timestamps one by one to the switch driver, which then proceeds to compare them based on the recorded timestamp ID that was generated in .port_txtstamp. The communication between the tagging protocol and the switch driver is done through a method exported by the switch driver, sja1110_process_meta_tstamp. To satisfy build requirements, we force a dependency to build the tagging protocol driver as a module when the switch driver is a module. However, as explained in the first paragraph, that causes the circular dependency. To solve this, move the skb queue from struct sja1105_private :: struct sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data. The latter is a data structure for which hacks have already been put into place to be able to create persistent storage per switch that is accessible from the tagging protocol driver (see sja1105_setup_ports). With the skb queue directly accessible from the tagging protocol driver, we can now move sja1110_process_meta_tstamp into the tagging driver itself, and avoid exporting a symbol. Fixes: 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Delete reload enable/disable interfaceLeon Romanovsky
Commit a0c76345e3d3 ("devlink: disallow reload operation during device cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload operation from racing with device probe/dismantle. After recent changes to move devlink_register() to the end of device probe and devlink_unregister() to the beginning of device dismantle, these races can no longer happen. Reload operations will be denied if the devlink instance is unregistered and devlink_unregister() will block until all in-flight operations are done. Therefore, remove these devlink_reload_{enable,disable}() APIs. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Allow control devlink ops behavior through feature maskLeon Romanovsky
Introduce new devlink call to set feature mask to control devlink behavior during device initialization phase after devlink_alloc() is already called. This allows us to set reload ops based on device property which is not known at the beginning of driver initialization. For the sake of simplicity, this API lacks any type of locking and needs to be called before devlink_register() to make sure that no parallel access to the ops is possible at this stage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Move netdev_to_devlink helpers to devlink.cLeon Romanovsky
Both netdev_to_devlink and netdev_to_devlink_port are used in devlink.c only, so move them in order to reduce their scope. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Reduce struct devlink exposureLeon Romanovsky
The declaration of struct devlink in general header provokes the situation where internal fields can be accidentally used by the driver authors. In order to reduce such possible situations, let's reduce the namespace exposure of struct devlink. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12PCI: Return NULL for to_pci_driver(NULL)Bjorn Helgaas
to_pci_driver() takes a pointer to a struct device_driver and uses container_of() to find the struct pci_driver that contains it. If given a NULL pointer to a struct device_driver, return a NULL pci_driver pointer instead of applying container_of() to NULL. This simplifies callers that would otherwise have to check for a NULL pointer first. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-12net/mlx5e: Mutually exclude RX-FCS and RX-port-timestampAya Levin
Due to current HW arch limitations, RX-FCS (scattering FCS frame field to software) and RX-port-timestamp (improved timestamp accuracy on the receive side) can't work together. RX-port-timestamp is not controlled by the user and it is enabled by default when supported by the HW/FW. This patch sets RX-port-timestamp opposite to RX-FCS configuration. Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12RDMA/rxe: Create AH index and return to user spaceBob Pearson
Make changes to rdma_user_rxe.h to allow indexing AH objects, passing the index in UD send WRs to the driver and returning the index to the rxe provider. Modify rxe_create_ah() to add an index to AH when created and if called from a new user provider return it to user space. If called from an old provider mark the AH as not having a useful index. Modify rxe_destroy_ah to drop the index before deleting the object. Link: https://lore.kernel.org/r/20211007204051.10086-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wrBob Pearson
Move the struct rxe_av av from struct rxe_send_wqe to struct rxe_send_wr placing it in wr.ud at the same offset as it was previously. This has the effect of increasing the size of struct rxe_send_wr while keeping the size of struct rxe_send_wqe the same. This better reflects the use of this field which is only used for UD sends. This change has no effect on ABI compatibility so the modified rxe driver will operate with older versions of rdma-core. Link: https://lore.kernel.org/r/20211007204051.10086-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12Merge branch '5.15/scsi-fixes' into 5.16/scsi-stagingMartin K. Petersen
Merge the 5.15/scsi-fixes branch into the staging tree to resolve UFS conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12RDMA/mlx5: Add modify_op_stat() supportAharon Landau
Add support for ib callback modify_op_stat() to add or remove an optional counter. When adding, a steering flow table is created with a rule that catches and counts all the matching packets. When removing, the table and flow counter are destroyed. Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/mlx5: Add steering support in optional flow countersAharon Landau
Adding steering infrastructure for adding and removing optional counter. This allows to add and remove the counters dynamically in order not to hurt performance. Link: https://lore.kernel.org/r/20211008122439.166063-12-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/nldev: Add support to get status of all countersAharon Landau
This patch adds the ability to get the name, index and status of all counters for each link through RDMA netlink. This can be used for user-space to get the current optional-counter mode. Examples: $ rdma statistic mode link rocep8s0f0/1 optional-counters cc_rx_ce_pkts $ rdma statistic mode supported link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts link rocep8s0f1/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts Link: https://lore.kernel.org/r/20211008122439.166063-8-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/counter: Add optional counter supportAharon Landau
An optional counter is a driver-specific counter that may be dynamically enabled/disabled. This enhancement allows drivers to expose counters which are, for example, mutually exclusive and cannot be enabled at the same time, counters that might degrades performance, optional debug counters, etc. Optional counters are marked with IB_STAT_FLAG_OPTIONAL flag. They are not exported in sysfs, and must be at the end of all stats, otherwise the attr->show() in sysfs would get wrong indexes for hwcounters that are behind optional counters. Link: https://lore.kernel.org/r/20211008122439.166063-7-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/counter: Add an is_disabled field in struct rdma_hw_statsAharon Landau
Add a bitmap in rdma_hw_stat structure, with each bit indicates whether the corresponding counter is currently disabled or not. By default hwcounters are enabled. Link: https://lore.kernel.org/r/20211008122439.166063-6-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/core: Add a helper API rdma_free_hw_stats_structMark Zhang
Add a new API rdma_free_hw_stats_struct to pair with rdma_alloc_hw_stats_struct (which is also de-inlined). This will be useful when there are more alloc/free works in following patches. Link: https://lore.kernel.org/r/20211008122439.166063-5-markzhang@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/counter: Add a descriptor in struct rdma_hw_statsAharon Landau
Add a counter statistic descriptor structure in rdma_hw_stats. In addition to the counter name, more meta-information will be added. This code extension is needed for optional-counter support in the following patches. Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12Merge branch 'mlx5-next' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux For dependencies in the following patches. * mellanox/mlx5-next: net/mlx5: Add priorities for counters in RDMA namespaces net/mlx5: Add ifc bits to support optional counters Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12SUNRPC: Simplify the SVC dispatch code pathChuck Lever
Micro-optimization: The last user of the generic SVC dispatch code path has been removed, so svc_process_common() can be simplified. This declutters the hot path so that the by-far most common case (a dispatch function exists) is made the /only/ path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-12Merge back ACPI PCI material for v5.16.Rafael J. Wysocki
2021-10-12net, neigh: Add NTF_MANAGED flag for managed neighbor entriesDaniel Borkmann
Allow a user space control plane to insert entries with a new NTF_EXT_MANAGED flag. The flag then indicates to the kernel that the neighbor entry should be periodically probed for keeping the entry in NUD_REACHABLE state iff possible. The use case for this is targeting XDP or tc BPF load-balancers which use the bpf_fib_lookup() BPF helper in order to piggyback on neighbor resolution for their backends. Given they cannot be resolved in fast-path, a control plane inserts the L3 (without L2) entries manually into the neighbor table and lets the kernel do the neighbor resolution either on the gateway or on the backend directly in case the latter resides in the same L2. This avoids to deal with L2 in the control plane and to rebuild what the kernel already does best anyway. NTF_EXT_MANAGED can be combined with NTF_EXT_LEARNED in order to avoid GC eviction. The kernel then adds NTF_MANAGED flagged entries to a per-neighbor table which gets triggered by the system work queue to periodically call neigh_event_send() for performing the resolution. The implementation allows migration from/to NTF_MANAGED neighbor entries, so that already existing entries can be converted by the control plane if needed. Potentially, we could make the interval for periodically calling neigh_event_send() configurable; right now it's set to DELAY_PROBE_TIME which is also in line with mlxsw which has similar driver-internal infrastructure c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table"). In future, the latter could possibly reuse the NTF_MANAGED neighbors as well. Example: # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Link: https://linuxplumbersconf.org/event/11/contributions/953/ Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Extend neigh->flags to 32 bit to allow for extensionsRoopa Prabhu
Currently, all bits in struct ndmsg's ndm_flags are used up with the most recent addition of 435f2e7cc0b7 ("net: bridge: add support for sticky fdb entries"). This makes it impossible to extend the neighboring subsystem with new NTF_* flags: struct ndmsg { __u8 ndm_family; __u8 ndm_pad1; __u16 ndm_pad2; __s32 ndm_ifindex; __u16 ndm_state; __u8 ndm_flags; __u8 ndm_type; }; There are ndm_pad{1,2} attributes which are not used. However, due to uncareful design, the kernel does not enforce them to be zero upon new neighbor entry addition, and given they've been around forever, it is not possible to reuse them today due to risk of breakage. One option to overcome this limitation is to add a new NDA_FLAGS_EXT attribute for extended flags. In struct neighbour, there is a 3 byte hole between protocol and ha_lock, which allows neigh->flags to be extended from 8 to 32 bits while still being on the same cacheline as before. This also allows for all future NTF_* flags being in neigh->flags rather than yet another flags field. Unknown flags in NDA_FLAGS_EXT will be rejected by the kernel. Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Enable state migration between NUD_PERMANENT and NTF_USEDaniel Borkmann
Currently, it is not possible to migrate a neighbor entry between NUD_PERMANENT state and NTF_USE flag with a dynamic NUD state from a user space control plane. Similarly, it is not possible to add/remove NTF_EXT_LEARNED flag from an existing neighbor entry in combination with NTF_USE flag. This is due to the latter directly calling into neigh_event_send() without any meta data updates as happening in __neigh_update(). Thus, to enable this use case, extend the latter with a NEIGH_UPDATE_F_USE flag where we break the NUD_PERMANENT state in particular so that a latter neigh_event_send() is able to re-resolve a neighbor entry. Before fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] As can be seen, despite the admin-triggered replace, the entry remains in the NUD_PERMANENT state. After fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn STALE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] After the fix, the admin-triggered replace switches to a dynamic state from the NTF_USE flag which triggered a new neighbor resolution. Likewise, we can transition back from there, if needed, into NUD_PERMANENT. Similar before/after behavior can be observed for below transitions: Before fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] After fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [..] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12mmc: core: Add host specific tuning support for eMMC HS400 modeWenbin Mei
This adds a ->execute_hs400_tuning() host callback to enable optional support for host specific tuning for eMMC HS400 mode. Additionally, share mmc_get_ext_csd() through the public host headerfile, to allow it to be used by the host drivers, which is needed to support the HS400 tuning. Signed-off-by: Wenbin Mei <wenbin.mei@mediatek.com> Link: https://lore.kernel.org/r/20210917124803.22871-3-wenbin.mei@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-11Merge tag 'linux-kselftest-kunit-fixes-5.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit fixes from Shuah Khan: - Fixes to address the structleak plugin causing the stack frame size to grow immensely when used with KUnit. Fixes include adding a new makefile to disable structleak and using it from KUnit iio, device property, thunderbolt, and bitfield tests to disable it. - KUnit framework reference count leak in kfree_at_end - KUnit tool fix to resolve conflict between --json and --raw_output and generate correct test output in either case. - kernel-doc warnings due to mismatched arg names * tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: fix kernel-doc warnings due to mismatched arg names bitfield: build kunit tests without structleak plugin thunderbolt: build kunit tests without structleak plugin device property: build kunit tests without structleak plugin iio/test-format: build kunit tests without structleak plugin gcc-plugins/structleak: add makefile var for disabling structleak kunit: fix reference count leak in kfree_at_end kunit: tool: better handling of quasi-bool args (--json, --raw_output)
2021-10-11Merge branch 'for-5.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "One patch to add a missing __printf annotation and the other to enable deferred printing for debug dumps to avoid deadlocks when triggered from some contexts (e.g. console drivers)" * 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix state-dump console deadlock workqueue: annotate alloc_workqueue() as printf
2021-10-11Merge branch 'v5.16/vfio/diana-fsl-reset-v2' into v5.16/vfio/nextAlex Williamson
2021-10-11PCI/VPD: Add pci_read/write_vpd_any()Heiner Kallweit
In certain cases we need a variant of pci_read_vpd()/pci_write_vpd() that does not check against dev->vpd.len. Such cases are: - Reading VPD if dev->vpd.len isn't set yet (in pci_vpd_size()) - Devices that map non-VPD information to arbitrary places in VPD address space (example: Chelsio T3 EEPROM write-protect flag) Therefore add pci_read_vpd_any() and pci_write_vpd_any() that check against PCI_VPD_MAX_SIZE only. Link: https://lore.kernel.org/r/93ecce28-a158-f02a-d134-8afcaced8efe@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-11Merge tag 'omap-for-v5.16/ti-sysc-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/drivers Driver changes for ti-sysc for v5.16 Changes for ti-sysc driver for improved system suspend and resume support as some drivers need to be reinitialized on resume. Also a non-urgent resume warning fix, and dropping of legacy flags for gpio and sham: - Fix timekeeping suspended warning on resume. Probably no need to merge this into fixes as it's gone unnoticed for a while. - Check for context loss for reinit of a module - Add add quirk handling to reinit on context loss, and also fix a build warning it caused - Add quirk handling to reset on reinit - Use context loss quirk for gpmc and otg - Handle otg force-idle quirk even if no driver is loaded - Drop legacy flags for gpio and sham * tag 'omap-for-v5.16/ti-sysc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: bus: ti-sysc: Fix variable set but not used warning for reinit_modules bus: ti-sysc: Drop legacy quirk flag for sham bus: ti-sysc: Drop legacy quirk flag for gpio bus: ti-sysc: Handle otg force idle quirk bus: ti-sysc: Use context lost quirk for otg bus: ti-sysc: Use context lost quirks for gpmc bus: ti-sysc: Add quirk handling for reset on re-init bus: ti-sysc: Add quirk handling for reinit on context lost bus: ti-sysc: Check for lost context in sysc_reinit_module() bus: ti-sysc: Fix timekeeping_suspended warning on resume Link: https://lore.kernel.org/r/pull-1633950030-501948@atomide.com-2 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-11Merge tag 'v5.15-next-soc' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/drivers - mt8192: add mutex support - mmsys: add more components add routing table for mt8192 add reset controller support * tag 'v5.15-next-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: drm/mediatek: mtk_dsi: Reset the dsi0 hardware soc: mediatek: mmsys: Add reset controller support soc: mediatek: add mtk mutex support for MT8192 soc: mediatek: mmsys: Add mt8192 mmsys routing table soc: mediatek: mmsys: add comp OVL_2L2/POSTMASK/RDMA4 Link: https://lore.kernel.org/r/b1d364d0-f2ae-488b-b3f7-c694049c20d3@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-11Merge tag 'memory-controller-drv-5.16' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into arm/drivers Memory controller drivers for v5.16 1. Renesas RPC: fix unaligned bus access and QSPI data transfers in manual modes. 2. Renesas RPC: select RESET_CONTROLLER as it is necessary for operation. 3. FSL IFC: fix error paths. 4. Broadcom: allow building as module. * tag 'memory-controller-drv-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl: memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe memory: renesas-rpc-if: RENESAS_RPCIF should select RESET_CONTROLLER memory: brcmstb_dpfe: Allow building Broadcom STB DPFE as module memory: samsung: describe drivers in KConfig memory: renesas-rpc-if: Avoid unaligned bus access for HyperFlash memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode dt-bindings: rpc: renesas-rpc-if: Add support for the R8A779A0 RPC-IF Link: https://lore.kernel.org/r/20211010175836.13302-1-krzysztof.kozlowski@canonical.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-11Merge tag 'tegra-for-5.16-cpuidle' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/drivers cpuidle: tegra: Changes for v5.16-rc1 This pulls in the for-5.16/clk and for-5.16/soc branches and uses the stubs added in them to enable compile testing of the cpuidle driver. While at it, this also fixes a potential driver probe order race condition between the PMC and the cpuidle driver. * tag 'tegra-for-5.16-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: cpuidle: tegra: Check whether PMC is ready cpuidle: tegra: Enable compile testing clk: tegra: Add stubs needed for compile testing Link: https://lore.kernel.org/r/20211008201132.1678814-5-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-11Merge tag 'tegra-for-5.16-soc' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/drivers soc/tegra: Changes for v5.16-rc1 This set consists of stub additions to enable compile testing for more drivers, exposes the PMC's USB regmap on all SoC generations, removes a state synchronization workaround that is no longer needed and adds an error reporting driver that can help troubleshoot crashes. To top it all off, an error handling path in the powergating code is fixed and the devm_platform_ioremap_resource() function is used to remove some boilerplate code. * tag 'tegra-for-5.16-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: soc/tegra: pmc: Use devm_platform_ioremap_resource() soc/tegra: Add Tegra186 ARI driver soc/tegra: Fix an error handling path in tegra_powergate_power_up() soc/tegra: pmc: Expose USB regmap to all SoCs soc/tegra: pmc: Disable PMC state syncing soc/tegra: pm: Make stubs usable for compile testing soc/tegra: irq: Add stubs needed for compile testing soc/tegra: fuse: Add stubs needed for compile testing Link: https://lore.kernel.org/r/20211008201132.1678814-4-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-11Merge tag 'drm-intel-gt-next-2021-10-08' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - Add uAPI for using PXP protected objects Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8064 - Add PCI IDs and LMEM discovery/placement uAPI for DG1 Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11584 - Disable engine bonding on Gen12+ except TGL, RKL and ADL-S Cross-subsystem Changes: - Merges 'tip/locking/wwmutex' branch (core kernel tip) - "mei: pxp: export pavp client to me client bus" Core Changes: - Update ttm_move_memcpy for async use (Thomas) Driver Changes: - Enable GuC submission by default on DG1 (Matt B) - Add PXP (Protected Xe Path) support for Gen12 integrated (Daniele, Sean, Anshuman) See "drm/i915/pxp: add PXP documentation" for details! - Remove force_probe protection for ADL-S (Raviteja) - Add base support for XeHP/XeHP SDV (Matt R, Stuart, Lucas) - Handle DRI_PRIME=1 on Intel igfx + Intel dgfx hybrid graphics setup (Tvrtko) - Use Transparent Hugepages when IOMMU is enabled (Tvrtko, Chris) - Implement LMEM backup and restore for suspend / resume (Thomas) - Report INSTDONE_GEOM values in error state for DG2 (Matt R) - Add DG2-specific shadow register table (Matt R) - Update Gen11/Gen12/XeHP shadow register tables (Matt R) - Maintain backward-compatible nested batch behavior on TGL+ (Matt R) - Add new LRI reg offsets for DG2 (Akeem) - Initialize unused MOCS entries to device specific values (Ayaz) - Track and use the correct UC MOCS index on Gen12 (Ayaz) - Add separate MOCS table for Gen12 devices other than TGL/RKL (Ayaz) - Simplify the locking and eliminate some RCU usage (Daniel) - Add some flushing for the 64K GTT path (Matt A) - Mark GPU wedging on driver unregister unrecoverable (Janusz) - Major rework in the GuC codebase, simplify locking and add docs (Matt B) - Add DG1 GuC/HuC firmwares (Daniele, Matt B) - Remember to call i915_sw_fence_fini on guc_state.blocked (Matt A) - Use "gt" forcewake domain name for error messages instead of "blitter" (Matt R) - Drop now duplicate LMEM uAPI RFC kerneldoc section (Daniel) - Fix early tracepoints for requests (Matt A) - Use locked access to ctx->engines in set_priority (Daniel) - Convert gen6/gen7/gen8 read operations to fwtable (Matt R) - Drop gen11/gen12 specific mmio write handlers (Matt R) - Drop gen11 specific mmio read handlers (Matt R) - Use designated initializers for init/exit table (Kees) - Fix syncmap memory leak (Matt B) - Add pretty printing for buddy allocator state debug (Matt A) - Fix potential error pointer dereference in pinned_context() (Dan) - Remove IS_ACTIVE macro (Lucas) - Static code checker fixes (Nathan) - Clean up disabled warnings (Nathan) - Increase timeout in i915_gem_contexts selftests 5x for GuC submission (Matt B) - Ensure wa_init_finish() is called for ctx workaround list (Matt R) - Initialize L3CC table in mocs init (Sreedhar, Ayaz, Ram) - Get PM ref before accessing HW register (Vinay) - Move __i915_gem_free_object to ttm_bo_destroy (Maarten) - Deduplicate frequency dump on debugfs (Lucas) - Make wa list per-gt (Venkata) - Do not define dummy vma in stack (Venkata) - Take pinning into account in __i915_gem_object_is_lmem (Matt B, Thomas) - Do not report currently active engine when describing objects (Tvrtko) - Fix pdfdocs build error by removing nested grid from GuC docs (Akira) - Remove false warning from the rps worker (Tejas) - Flush buffer pools on driver remove (Janusz) - Fix runtime pm handling in i915_gem_shrink (Maarten) - Rework TTM object initialization slightly (Thomas) - Use fixed offset for PTEs location (Michal Wa) - Verify result from CTB (de)register action and improve error messages (Michal Wa) - Fix bug in user proto-context creation that leaked contexts (Matt B) - Re-use Gen11 forcewake read functions on Gen12 (Matt R) - Make shadow tables range-based (Matt R) - Ditch the i915_gem_ww_ctx loop member (Thomas, Maarten) - Use NULL instead of 0 where appropriate (Ville) - Rename pci/debugfs functions to respect file prefix (Jani, Lucas) - Drop guc_communication_enabled (Daniele) - Selftest fixes (Thomas, Daniel, Matt A, Maarten) - Clean up inconsistent indenting (Colin) - Use direction definition DMA_BIDIRECTIONAL instead of PCI_DMA_BIDIRECTIONAL (Cai) - Add "intel_" as prefix in set_mocs_index() (Ayaz) From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YWAO80MB2eyToYoy@jlahtine-mobl.ger.corp.intel.com Signed-off-by: Dave Airlie <airlied@redhat.com>