summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-28IB/hfi1: Move rvt_cq_wc struct into uapi directoryKamenee Arumugam
The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. In that case, move rvt_cq_wc struct into the rvt-abi.h header file and create a rvt_k_cq_w for the kernel completion queue. Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-26net/mlx5: E-Switch, Enable vport metadata matching if firmware supports itJianbo Liu
As the ingress ACL rules save vhca id and vport number to packet's metadata REG_C_0, and the metadata matching for the rules in both fast path and slow path are all added, enable this feature if supported. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26RDMA/mlx5: Add vport metadata matching for IB representorsJianbo Liu
If vport metadata matching is enabled in eswitch, the rule created must be changed to match on the metadata, instead of source port. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Add match on vport metadata for rule in slow pathJianbo Liu
In slow path, packet that not matched by any offloaded rule is forwarded to eswitch vport manager for further processing. Add matching on metadata for peer miss rules in FDB, and rules which forward packet to correct representor in esw manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Pass metadata from FDB to eswitch managerJianbo Liu
In order to do matching on metadata in slow path when demuxing traffic to representors, explicitly enable the feature that allows HW to pass metadata REG_C_0 from FDB to eswitch manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Add query and modify esw vport context functionsJianbo Liu
Add esw vport query and modify functions, and exposing them is needed for enabling or disabling registers passed as metatdata to vport NIC_RX table in slow path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Add match on vport metadata for rule in fast pathJianbo Liu
If FW's capabilities and configurations meet the requirement of vport metadata matching, this feature will be used. As the information about vport number and vhca_id related to packet is already stored to its metadata register, which is used as an indicator for perticular vport, now we can change to match on this metadata for all the offloading rules in fast path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5e: Specifying known origin of packets matching the flowJianbo Liu
In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ↵Jianbo Liu
ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Add flow context for flow tagJianbo Liu
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Introduce a helper API to check VF vportParav Pandit
Introduce a helper API mlx5_eswitch_is_vf_vport() to check if a given vport_num belongs to VF or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Support allocating modify header context from ingress ACLJianbo Liu
That modify header action can be then attached to a steering rule in the ingress ACL. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Get vport ACL namespace by vport indexJianbo Liu
The ingress and egress ACL root namespaces are created per vport and stored into arrays. However, the vport number is not the same as the index. Passing the array index, instead of vport number, to get the correct ingress and egress acl namespace. Fixes: 9b93ab981e3b ("net/mlx5: Separate ingress/egress namespaces for each vport") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Introduce vport metadata matching bits and enum constantsJianbo Liu
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport. So we replace the match on source port with the match on metadata that was configured in ingress ACL, and that metadata will be passed further also to the NIC RX table of the eswitch manager. Introduce vport metadata matching bits and enum constants as a pre-step towards metadata matching. o metadata type C registers in the misc parameters 2 fields. o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports ingress ACL for the uplink vport. o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to support propagating the metadata to the nic rx through the loopback path. o flow_source in flow context, to indicate the known origin of packets. o enum constants, to support the above bits. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25RDMA/hns: fix spelling mistake "attatch" -> "attach"Colin Ian King
There is a spelling mistake in an dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/netlink: Audit policy settings for netlink attributesDoug Ledford
For all string attributes for which we don't currently accept the element as input, we only use it as output, set the string length to RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1. That way we will only accept a null string for that element. This will prevent someone from writing a new input routine that uses the element without also updating the policy to have a valid value. Also while there, make sure the existing entries that are valid have the correct policy, if not, correct the policy. Remove unnecessary checks for nla_strlcpy() overflow once the policy has been set correctly. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: Cleanup unnecessary exported symbolsLijun Ou
This patch removes the hns-roce.ko for cleanup all the exported symbols in common part. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25docs: infiniband: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
The InfiniBand docs are plain text with no markups. So, all we needed to do were to add the title markups and some markup sequences in order to properly parse tables, lists and literal blocks. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: Fix an error code in hns_roce_set_user_sq_size()Dan Carpenter
This function is supposed to return negative kernel error codes but here it returns CMD_RST_PRC_EBUSY (2). The error code eventually gets passed to IS_ERR() and since it's not an error pointer it leads to an Oops in hns_roce_v1_rsv_lp_qp() Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: fix potential integer overflow on left shiftColin Ian King
There is a potential integer overflow when int i is left shifted as this is evaluated using 32 bit arithmetic but is being used in a context that expects an expression of type dma_addr_t. Fix this by casting integer i to dma_addr_t before shifting to avoid the overflow. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 2ac0bc5e725e ("RDMA/hns: Add a group interfaces for optimizing buffers getting flow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24net/mlx5: Convert mkey_table to XArrayMatthew Wilcox
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24RDMA/mlx5: Refactor MR descriptors allocationMax Gurtovoy
Improve code readability using static helpers for each memory region type. Re-use the common logic to get smaller functions that are easy to maintain and reduce code duplication. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Use PA mapping for PI handoverMax Gurtovoy
If possibe, avoid doing a UMR operation to register data and protection buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the SG lists using PA access. This is safe, since the internal key for data and protection never exposed to the remote server (only signature key might be exposed). If PA mappings are not possible, perform mapping using MTT/KLM descriptors. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1266.4K/1262.4K 1720.1K/1732.1K 4k 793139/570902 1129.6K/773982 32k 72660/72086 97229/96164 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1590.2K/1600.1K 1828.2K/1830.3K 4k 1078.1K/937272 1142.1K/815304 32k 77012/77369 98125/97435 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Improve PI handover performanceIsrael Rukshin
In some loads, there is performance degradation when using KLM mkey instead of MTT mkey. This is because KLM descriptor access is via indirection that might require more HW resources and cycles. Using KLM descriptor is not necessary when there are no gaps at the data/metadata sg lists. As an optimization, use MTT mkey whenever it is possible. For that matter, allocate internal MTT mkey and choose the effective pi_mr for in transaction according to the required mapping scheme. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o/baseline): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1262.4K/1243.3K/1147.1K 1732.1K/1725.1K/1423.8K 4k 570902/571233/457874 773982/743293/642080 32k 72086/72388/71933 96164/71789/93249 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1600.1K/1572.1K/1393.3K 1830.3K/1823.5K/1557.2K 4k 937272/921992/762934 815304/753772/646071 32k 77369/75052/72058 97435/73180/94612 Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Idan Burstein <idanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR codeIsrael Rukshin
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY was used. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/rw: Use IB_WR_REG_MR_INTEGRITY for PI handoverIsrael Rukshin
Replace the old signature handover API with the new one. The new API simplifes PI handover code complexity for ULPs and improve performance. For RW API it will reduce the maximum number of work requests per task and the need of dealing with multiple MRs (and their registrations and invalidations) per task. All the mappings and registration of the data and the protection buffers is done by the LLD using a single WR and a special MR type (IB_MR_TYPE_INTEGRITY) for the PI handover operation. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1243.3K/1182.3K 1725.1K/1680.2K 4k 571233/528835 743293/748259 32k 72388/71086 71789/93573 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1572.1K/1427.2K 1823.5K/1724.3K 4k 921992/916194 753772/768267 32k 75052/73960 73180/95484 There is a performance degradation when writing big block sizes. Degradation is caused by the complexity of combining multiple indirections and perform RDMA READ operation from it. This will be fixed in the following patches by reducing the indirections if possible. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/rw: Introduce rdma_rw_inv_key helperIsrael Rukshin
This is a preparation for adding new signature API to the rw-API. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Validate integrity handover device capMax Gurtovoy
Protect the case that a ULP tries to allocate a QP with signature enabled flag while the LLD doesn't support this feature. While we're here, also move integrity_en attribute from mlx5_qp to ib_qp as a preparation for adding new integrity API to the rw-API (that is part of ib_core module). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Rename signature qp create flag and signature device capabilityIsrael Rukshin
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Add an integrity MR pool supportIsrael Rukshin
This is a preparation for adding new signature API to the rw-API. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24IB/iser: Unwind WR union at iser_tx_descIsrael Rukshin
After decreasing WRs array size from 7 to 3 it is more readable to give each WR a descriptive name. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handoverIsrael Rukshin
Using this new API reduces iSER code complexity. It also reduces the maximum number of work requests per task and the need of dealing with multiple MRs (and their registrations and invalidations) per task. It is done by using a single WR and a special MR type (IB_MR_TYPE_INTEGRITY) for PI operation. The setup of the tested benchmark: - 2 servers with 24 cores (1 initiator and 1 target) - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1236.6K/1164.3K 1357.2K/1332.8K 1k 1196.5K/1163.8K 1348.4K/1262.7K 2k 1016.7K/921950 1003.7K/931230 4k 662728/600545 595423/501513 8k 385954/384345 333775/277090 16k 222864/222820 170317/170671 32k 116869/114896 82331/82244 64k 55205/54931 40264/40021 Using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1090.1K/1030.9K 1303.9K/1101.4K 1k 1057.7K/904583 1318.4K/988085 2k 965226/638799 1008.6K/692514 4k 555479/410151 542414/414517 8k 298675/224964 264729/237508 16k 133485/122481 164625/138647 32k 74329/67615 80143/78743 64k 35716/35519 39294/37334 We get performance improvement at all block sizes. The most significant improvement is when writing 4k bs (almost 30% more iops). Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work requestMax Gurtovoy
This new WR will be used to perform PI (protection information) handover using the new API. Using the new API, the user will post a single WR that will internally perform all the needed actions to complete PI operation. This new WR will use a memory region that was allocated as IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the registration. In the old API, in order to perform a signature handover operation, each ULP should perform the following: 1. Map and register the data buffers. 2. Map and register the protection buffers. 3. Post a special reg WR to configure the signature handover operation layout. 4. Invalidate the signature memory key. 5. Invalidate protection buffers memory key. 6. Invalidate data buffers memory key. In the new API, the mapping of both data and protection buffers is performed using a single call to ib_map_mr_sg_pi function. Also the registration of the buffers and the configuration of the signature operation layout is done by a single new work request called IB_WR_REG_MR_INTEGRITY. This patch implements this operation for mlx5 devices that are capable to offload data integrity generation/validation while performing the actual buffer transfer. This patch will not remove the old signature API that is used by the iSER initiator and target drivers. This will be done in the future. In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work request, we are using a single UMR operation to register both data and protection buffers using KLM's. Afterwards, another UMR operation will describe the strided block format. These will be followed by 2 SET_PSV operations to set the memory/wire domains initial signature parameters passed by the user. In the end of the whole transaction, only the signature memory key (the one that exposed for the RDMA operation) will be invalidated. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Update set_sig_data_segment attribute for new signature APIMax Gurtovoy
Explicitly pass the sig_mr and the access flags for the mkey segment configuration. This function will be used also in the new signature API, so modify it in order to use it in both APIs. This is a preparation commit before adding new signature API. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Pass UMR segment flags instead of booleanMax Gurtovoy
UMR ctrl segment flags can vary between UMR operations. for example, using inline UMR or adding free/not-free checks for a memory key. This is a preparation commit before adding new signature API that will not need not-free checks for the internal memory key during the UMR operation. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Add attr for max number page list length for PI operationMax Gurtovoy
PI offload (protection information) is a feature that each RDMA provider can implement differently. Thus, introduce new device attribute to define the maximal length of the page list for PI fast registration operation. For example, mlx5 driver uses a single internal MR to map both data and protection SGL's, so it's equal to max_fast_reg_page_list_len / 2. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrityMax Gurtovoy
mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the mlx5 memory region prior to the registration operation. In the new API, the mlx5 driver will allocate an internal memory region for the UMR operation to register both PI and data SG lists. The internal MR will use KLM mode in order to map 2 (possibly non-contiguous/non-align) SG lists using 1 memory key. In the new API, each ULP will use 1 memory region for the signature operation (instead of 3 in the old API). This memory region will have a key that will be exposed to remote server to perform RDMA operation. The internal memory key that will map the SG lists will stay private. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Add signature attrs element for ib_mr structureMax Gurtovoy
This element will describe the needed characteristics for the signature operation per signature enabled memory region (type IB_MR_TYPE_INTEGRITY). Also add meta_length attribute to ib_sig_attrs structure for saving the mapped metadata length (needed for the new API implementation). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Introduce ib_map_mr_sg_pi to map data/protection sgl'sMax Gurtovoy
This function will map the previously dma mapped SG lists for PI (protection information) and data to an appropriate memory region for future registration. The given MR must be allocated as IB_MR_TYPE_INTEGRITY. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Introduce IB_MR_TYPE_INTEGRITY and ib_alloc_mr_integrity APIIsrael Rukshin
This is a preparation for signature verbs API re-design. In the new design a single MR with IB_MR_TYPE_INTEGRITY type will be used to perform the needed mapping for data integrity operations. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Save the MR type in the ib_mr structureMax Gurtovoy
This is a preparation for the signature verbs API change. This change is needed since the MR type will define, in the upcoming patches, the need for allocating internal resources in LLD for signature handover related operations. It will also help to make sure that signature related functions are called with an appropriate MR type and fail otherwise. Also introduce new mr types IB_MR_TYPE_USER, IB_MR_TYPE_DMA and IB_MR_TYPE_DM for correctness. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Introduce new header file for signature operationsMax Gurtovoy
Ease the exhausted ib_verbs.h file and make the code more readable. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-22Linux 5.2-rc6v5.2-rc6Linus Torvalds
2019-06-22Merge tag 'iommu-fix-v5.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fix from Joerg Roedel: "Revert a commit from the previous pile of fixes which causes new lockdep splats. It is better to revert it for now and work on a better and more well tested fix" * tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
2019-06-22Revert "iommu/vt-d: Fix lock inversion between iommu->lock and ↵Peter Xu
device_domain_lock" This reverts commit 7560cc3ca7d9d11555f80c830544e463fcdb28b8. With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt: ====================================================== WARNING: possible circular locking dependency detected 5.2.0-rc5 #78 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: 00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0 but task is already holding lock: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (device_domain_lock){....}: _raw_spin_lock_irqsave+0x3c/0x50 dmar_insert_one_dev_info+0xbb/0x510 domain_add_dev_info+0x50/0x90 dev_prepare_static_identity_mapping+0x30/0x68 intel_iommu_init+0xddd/0x1422 pci_iommu_init+0x16/0x3f do_one_initcall+0x5d/0x2b4 kernel_init_freeable+0x218/0x2c1 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 -> #0 (&(&iommu->lock)->rlock){+.+.}: lock_acquire+0x9e/0x170 _raw_spin_lock+0x25/0x30 domain_context_mapping_one+0xa5/0x4e0 pci_for_each_dma_alias+0x30/0x140 dmar_insert_one_dev_info+0x3b2/0x510 domain_add_dev_info+0x50/0x90 dev_prepare_static_identity_mapping+0x30/0x68 intel_iommu_init+0xddd/0x1422 pci_iommu_init+0x16/0x3f do_one_initcall+0x5d/0x2b4 kernel_init_freeable+0x218/0x2c1 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(device_domain_lock); lock(&(&iommu->lock)->rlock); lock(device_domain_lock); lock(&(&iommu->lock)->rlock); *** DEADLOCK *** 2 locks held by swapper/0/1: #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422 #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0 stack backtrace: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78 Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018 Call Trace: dump_stack+0x85/0xc0 print_circular_bug.cold.57+0x15c/0x195 __lock_acquire+0x152a/0x1710 lock_acquire+0x9e/0x170 ? domain_context_mapping_one+0xa5/0x4e0 _raw_spin_lock+0x25/0x30 ? domain_context_mapping_one+0xa5/0x4e0 domain_context_mapping_one+0xa5/0x4e0 ? domain_context_mapping_one+0x4e0/0x4e0 pci_for_each_dma_alias+0x30/0x140 dmar_insert_one_dev_info+0x3b2/0x510 domain_add_dev_info+0x50/0x90 dev_prepare_static_identity_mapping+0x30/0x68 intel_iommu_init+0xddd/0x1422 ? printk+0x58/0x6f ? lockdep_hardirqs_on+0xf0/0x180 ? do_early_param+0x8e/0x8e ? e820__memblock_setup+0x63/0x63 pci_iommu_init+0x16/0x3f do_one_initcall+0x5d/0x2b4 ? do_early_param+0x8e/0x8e ? rcu_read_lock_sched_held+0x55/0x60 ? do_early_param+0x8e/0x8e kernel_init_freeable+0x218/0x2c1 ? rest_init+0x230/0x230 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 domain_context_mapping_one() is taking device_domain_lock first then iommu lock, while dmar_insert_one_dev_info() is doing the reverse. That should be introduced by commit: 7560cc3ca7d9 ("iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock", 2019-05-27) So far I still cannot figure out how the previous deadlock was triggered (I cannot find iommu lock taken before calling of iommu_flush_dev_iotlb()), however I'm pretty sure that that change should be incomplete at least because it does not fix all the places so we're still taking the locks in different orders, while reverting that commit is very clean to me so far that we should always take device_domain_lock first then the iommu lock. We can continue to try to find the real culprit mentioned in 7560cc3ca7d9, but for now I think we should revert it to fix current breakage. CC: Joerg Roedel <joro@8bytes.org> CC: Lu Baolu <baolu.lu@linux.intel.com> CC: dave.jiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-22Merge tag 'pci-v5.2-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2 because we don't yet know how to support P2PDMA in that case (Logan Gunthorpe)" * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
2019-06-22Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three driver fixes (and one version number update): a suspend hang in ufs, a qla hard lock on module removal and a qedi panic during discovery" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Fix hardlockup in abort command during driver remove scsi: ufs: Avoid runtime suspend possibly being blocked forever scsi: qedi: update driver version to 8.37.0.20 scsi: qedi: Check targetname while finding boot target information
2019-06-22Merge tag 'powerpc-5.2-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "This is a frustratingly large batch at rc5. Some of these were sent earlier but were missed by me due to being distracted by other things, and some took a while to track down due to needing manual bisection on old hardware. But still we clearly need to improve our testing of KVM, and of 32-bit, so that we catch these earlier. Summary: seven fixes, all for bugs introduced this cycle. - The commit to add KASAN support broke booting on 32-bit SMP machines, due to a refactoring that moved some setup out of the secondary CPU path. - A fix for another 32-bit SMP bug introduced by the fast syscall entry implementation for 32-bit BOOKE. And a build fix for the same commit. - Our change to allow the DAWR to be force enabled on Power9 introduced a bug in KVM, where we clobber r3 leading to a host crash. - The same commit also exposed a previously unreachable bug in the nested KVM handling of DAWR, which could lead to an oops in a nested host. - One of the DMA reworks broke the b43legacy WiFi driver on some people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit. - A fix for TLB flushing in KVM introduced a new bug, as it neglected to also flush the ERAT, this could lead to memory corruption in the guest. Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry Finger, Michael Neuling, Suraj Jitindar Singh" * tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr() powerpc/32: fix build failure on book3e with KVM powerpc/booke: fix fast syscall entry on SMP powerpc/32s: fix initial setup of segment registers on secondary CPU
2019-06-22Bluetooth: Fix regression with minimum encryption key size alignmentMarcel Holtmann
When trying to align the minimum encryption key size requirement for Bluetooth connections, it turns out doing this in a central location in the HCI connection handling code is not possible. Original Bluetooth version up to 2.0 used a security model where the L2CAP service would enforce authentication and encryption. Starting with Bluetooth 2.1 and Secure Simple Pairing that model has changed into that the connection initiator is responsible for providing an encrypted ACL link before any L2CAP communication can happen. Now connecting Bluetooth 2.1 or later devices with Bluetooth 2.0 and before devices are causing a regression. The encryption key size check needs to be moved out of the HCI connection handling into the L2CAP channel setup. To achieve this, the current check inside hci_conn_security() has been moved into l2cap_check_enc_key_size() helper function and then called from four decisions point inside L2CAP to cover all combinations of Secure Simple Pairing enabled devices and device using legacy pairing and legacy service security model. Fixes: d5bb334a8e17 ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203643 Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>