summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-01-16RDMA/core: Add the core support field to METHOD_GET_CONTEXTMichael Guralnik
Add the core support field to METHOD_GET_CONTEXT, this field should represent capabilities that are not device-specific. Return support for optional access flags for memory regions. User-space will use this capability to mask the optional access flags for unsupporting kernels. Link: https://lore.kernel.org/r/1578506740-22188-10-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/uverbs: Add new relaxed ordering memory region access flagMichael Guralnik
Add a new relaxed ordering access flag for memory regions. Using memory regions with relaxed ordeing set can enhance performance. This access flag is handled in a best-effort manner, drivers should ignore if they don't support setting relaxed ordering. Link: https://lore.kernel.org/r/1578506740-22188-9-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Add optional access flags rangeMichael Guralnik
Define a range of access flags that are defined to be optional, both uverbs and drivers should enable getting them and use if they are applicable This will be used, for example, for the relaxed ordering access flag which unsupporting drivers can ignore. Link: https://lore.kernel.org/r/1578506740-22188-7-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/uverbs: Verify MR access flagsMichael Guralnik
Verify that MR access flags that are passed from user are all supported ones, otherwise an error is returned. Fixes: 4fca03778351 ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi") Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/uverbs: Add ioctl command to get a device contextJason Gunthorpe
Allow future extensions of the get context command through the uverbs ioctl kabi. Unlike the uverbs version this does not return an async_fd as well, that has to be done with another command. Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOCJason Gunthorpe
Allow the async FD to be allocated separately from the context. This is necessary to introduce the ioctl to create a context, as an ioctl should only ever create a single uobject at a time. If multiple async FDs are created then the first one is used to deliver affiliated events from any ib_uevent_object, with all subsequent ones will receive only unaffiliated events. Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16Merge branch 'mlx5-next' into rdma.git for-nextJason Gunthorpe
From the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merged due to dependencies in the next patches. * branch 'mlx5-next': net/mlx5: Expose relaxed ordering bits net/mlx5: Add RoCE accelerator counters
2020-01-16net/mlx5: Expose relaxed ordering bitsMichael Guralnik
Expose relaxed ordering bits in HCA capability and mkey context structs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16net/mlx5: Add RoCE accelerator countersLeon Romanovsky
Add RoCE accelerator definitions. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-13RDMA/core: Make ib_uverbs_async_event_file into a uobjectJason Gunthorpe
This makes async events aligned with completion events as both are full uobjects of FD type and use the same uobject lifecycle. A bunch of duplicate code is consolidated and the general flow between the two FDs is now very similar. Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Remove the ufile arg from rdma_alloc_begin_uobjectJason Gunthorpe
Now that all callers provide a non-NULL attrs the ufile is redundant. Adjust things so that the context handling is done inside alloc_uobj, and the ib_uverbs_get_ucontext_file() is avoided if we already have the context. Link: https://lore.kernel.org/r/1578504126-9400-13-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_wq.uobjectJason Gunthorpe
This is a struct ib_uwq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-10-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_srq.uobjectJason Gunthorpe
This is a struct ib_usrq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-9-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_qp.uobjectJason Gunthorpe
This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_cq.uobjectJason Gunthorpe
This is a struct ib_ucq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-7-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not allow alloc_commit to failJason Gunthorpe
This is a left over from an earlier version that creates a lot of complexity for error unwind, particularly for FD uobjects. The only reason this was done is so that anon_inode_get_file() could be called with the final fops and a fully setup uobject. Both need to be setup since unwinding anon_inode_get_file() via fput will call the driver's release(). Now that the driver does not provide release, we no longer need to worry about this complicated sequence, simply create the struct file at the start and allow the core code's release function to deal with the abort case. This allows all the confusing error paths around commit to be removed. Link: https://lore.kernel.org/r/1578504126-9400-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify destruction of FD uobjectsJason Gunthorpe
FD uobjects have a weird split between the struct file and uobject world. Simplify this to make them pure uobjects and use a generic release method for all struct file operations. This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always called before erasing the linked list contents to make the concurrancy simpler to understand. For this to work the uobject destruction must fence anything that it is cleaning up - the design must not rely on struct file lifetime. Only deliver_event() relies on the struct file to when adding new events to the queue, add a is_destroyed check under lock to block it. Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/mlx5: Use RCU and direct refcounts to keep memory aliveJason Gunthorpe
dispatch_event_fd() runs from a notifier with minimal locking, and relies on RCU and a file refcount to keep the uobject and eventfd alive. As the next patch wants to remove the file_operations release function from the drivers, re-organize things so that the devx_event_notifier() path uses the existing RCU to manage the lifetime of the uobject and eventfd. Move the refcount puts to a call_rcu so that the objects are guaranteed to exist and remove the indirect file refcount. Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/uverbs: Remove needs_kfree_rcu from uverbs_obj_type_classJason Gunthorpe
After device disassociation the uapi_objects are destroyed and freed, however it is still possible that core code can be holding a kref on the uobject. When it finally goes to uverbs_uobject_free() via the kref_put() it can trigger a use-after-free on the uapi_object. Since needs_kfree_rcu is a micro optimization that only benefits file uobjects, just get rid of it. There is no harm in using kfree_rcu even if it isn't required, and the number of involved objects is small. Link: https://lore.kernel.org/r/20200113143306.GA28717@ziepe.ca Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12IB/mlx5: Introduce VAR object and its alloc/destroy methodsYishai Hadas
Introduce VAR object and its alloc/destroy KABI methods. The internal implementation uses the IB core API to manage mmap/munamp calls. Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12Merge branch 'mlx5_vdpa' into rdma.git for-nextJason Gunthorpe
From the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merged due to dependencies in the next patches. * branch 'mlx5_vdpa': net/mlx5: Expose vDPA emulation device capabilities net/mlx5: Add Virtio Emulation related device capabilities
2020-01-10net/mlx5: Expose vDPA emulation device capabilitiesYishai Hadas
Expose vDPA emulation device capabilities from the core layer. It includes reading the capabilities from the firmware and exposing helper functions to access the data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-10net/mlx5: Add Virtio Emulation related device capabilitiesYishai Hadas
Add Virtio Emulation related fields to the device capabilities. It includes a general bit to indicate whether Virtio Emulation is supported and the capabilities structure itself. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-07IB/core: Rename event_handler_lock to qp_open_list_lockParav Pandit
This lock is used to protect the qp->open_list linked list. As a side effect it seems to also globally serialize the qp event_handler, but it isn't clear if that is a deliberate design. Link: https://lore.kernel.org/r/20191212113024.336702-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07IB/core: Cut down single member ib_cache structureParav Pandit
Given that ib_cache structure has only single member now, merge the cache lock directly in the ib_device. Link: https://lore.kernel.org/r/20191212113024.336702-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07IB/core: Let IB core distribute cache update eventsParav Pandit
Currently when the low level driver notifies Pkey, GID, and port change events they are notified to the registered handlers in the order they are registered. IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey change events. Since all GID queries done by ULPs are serviced by IB core, and the IB core deferes cache updates to a work queue, it is possible for other clients to see stale cache data when they handle their own events. For example, the below call tree shows how ipoib will call rdma_query_gid() concurrently with the update to the cache sitting in the WQ. mlx5_ib_handle_event() ib_dispatch_event() ib_cache_event() queue_work() -> slow cache update [..] ipoib_event() queue_work() [..] work handler ipoib_ib_dev_flush_light() __ipoib_ib_dev_flush() ipoib_dev_addr_changed_valid() rdma_query_gid() <- Returns old GID, cache not updated. Move all the event dispatch to a work queue so that the cache update is always done before any clients are notified. Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to cache") Link: https://lore.kernel.org/r/20191212113024.336702-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07RDMA/core: Add trace points to follow MR allocationChuck Lever
Track the lifetime of ib_mr objects. Here's sample output from a test run with NFS/RDMA: <...>-361 [009] 79238.772782: mr_alloc: pd.id=3 mr.id=11 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772812: mr_alloc: pd.id=3 mr.id=12 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772839: mr_alloc: pd.id=3 mr.id=13 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772866: mr_alloc: pd.id=3 mr.id=14 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772893: mr_alloc: pd.id=3 mr.id=15 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772921: mr_alloc: pd.id=3 mr.id=16 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772947: mr_alloc: pd.id=3 mr.id=17 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772974: mr_alloc: pd.id=3 mr.id=18 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.773001: mr_alloc: pd.id=3 mr.id=19 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.773028: mr_alloc: pd.id=3 mr.id=20 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.773055: mr_alloc: pd.id=3 mr.id=21 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.270942: mr_alloc: pd.id=3 mr.id=22 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.270975: mr_alloc: pd.id=3 mr.id=23 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271007: mr_alloc: pd.id=3 mr.id=24 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271036: mr_alloc: pd.id=3 mr.id=25 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271067: mr_alloc: pd.id=3 mr.id=26 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271095: mr_alloc: pd.id=3 mr.id=27 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271121: mr_alloc: pd.id=3 mr.id=28 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271153: mr_alloc: pd.id=3 mr.id=29 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271181: mr_alloc: pd.id=3 mr.id=30 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271208: mr_alloc: pd.id=3 mr.id=31 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271236: mr_alloc: pd.id=3 mr.id=32 type=MEM_REG max_num_sg=30 rc=0 <...>-4351 [001] 79242.299400: mr_dereg: mr.id=32 <...>-4351 [001] 79242.299467: mr_dereg: mr.id=31 <...>-4351 [001] 79242.299554: mr_dereg: mr.id=30 <...>-4351 [001] 79242.299615: mr_dereg: mr.id=29 <...>-4351 [001] 79242.299684: mr_dereg: mr.id=28 <...>-4351 [001] 79242.299748: mr_dereg: mr.id=27 <...>-4351 [001] 79242.299812: mr_dereg: mr.id=26 <...>-4351 [001] 79242.299874: mr_dereg: mr.id=25 <...>-4351 [001] 79242.299944: mr_dereg: mr.id=24 <...>-4351 [001] 79242.300009: mr_dereg: mr.id=23 <...>-4351 [001] 79242.300190: mr_dereg: mr.id=22 <...>-4351 [001] 79242.300263: mr_dereg: mr.id=21 <...>-4351 [001] 79242.300326: mr_dereg: mr.id=20 <...>-4351 [001] 79242.300388: mr_dereg: mr.id=19 <...>-4351 [001] 79242.300450: mr_dereg: mr.id=18 <...>-4351 [001] 79242.300516: mr_dereg: mr.id=17 <...>-4351 [001] 79242.300629: mr_dereg: mr.id=16 <...>-4351 [001] 79242.300718: mr_dereg: mr.id=15 <...>-4351 [001] 79242.300784: mr_dereg: mr.id=14 <...>-4351 [001] 79242.300879: mr_dereg: mr.id=13 <...>-4351 [001] 79242.300945: mr_dereg: mr.id=12 <...>-4351 [001] 79242.301012: mr_dereg: mr.id=11 Some features of the output: - The lifetime and owner PD of each MR is clearly visible. - The type of MR is captured, as is the SGE array size. - Failing MR allocation can be recorded. Link: https://lore.kernel.org/r/20191218201820.30584.34636.stgit@manet.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07RDMA/core: Trace points for diagnosing completion queue issuesChuck Lever
Sample trace events: kworker/u29:0-300 [007] 120.042217: cq_alloc: cq.id=4 nr_cqe=161 comp_vector=2 poll_ctx=WORKQUEUE <idle>-0 [002] 120.056292: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 120.056402: cq_process: cq.id=4 wake-up took 109 [us] from interrupt kworker/2:1H-482 [002] 120.056407: cq_poll: cq.id=4 requested 16, returned 1 <idle>-0 [002] 120.067503: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 120.067537: cq_process: cq.id=4 wake-up took 34 [us] from interrupt kworker/2:1H-482 [002] 120.067541: cq_poll: cq.id=4 requested 16, returned 1 <idle>-0 [002] 120.067657: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 120.067672: cq_process: cq.id=4 wake-up took 15 [us] from interrupt kworker/2:1H-482 [002] 120.067674: cq_poll: cq.id=4 requested 16, returned 1 ... systemd-1 [002] 122.392653: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 122.392688: cq_process: cq.id=4 wake-up took 35 [us] from interrupt kworker/2:1H-482 [002] 122.392693: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.392836: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.392970: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.393083: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.393195: cq_poll: cq.id=4 requested 16, returned 3 Several features to note in this output: - The WCE count and context type are reported at allocation time - The CPU and kworker for each CQ is evident - The CQ's restracker ID is tagged on each trace event - CQ poll scheduling latency is measured - Details about how often single completions occur versus multiple completions are evident - The cost of the ULP's completion handler is recorded Link: https://lore.kernel.org/r/20191218201815.30584.3481.stgit@manet.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/cm: Delete unused CM ARP functionsLeon Romanovsky
Clean the code by deleting ARP functions, which are not called anyway. Link: https://lore.kernel.org/r/20191212093830.316934-46-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03IB/rdmavt: Correct comments in rdmavt_qp.h headerMike Marciniszyn
Comments need to be with the definition of rvt_restart_sge(). Other comments were duplicated in sw/rdmavt/rc.c and were removed. Fixes: 385156c5f2a6 ("IB/hfi: Move RC functions into a header file") Link: https://lore.kernel.org/r/20191219211934.58387.88014.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/mlx4: Redo TX checksum offload in line with docsEugene Crosser
Ingress checksum offload was not working for IPv6 frames because the conditional expression that checks validation status passed from the hardware was not matching the algorithm described in the documentation. This patch defines L4_CSUM flag (which falls inside the badfcs_enc field in the existing definition of the CQE layout) and replaces the conditional expression with the one defined in the "ConnectX(r) Family Programmer's Manual" document. Link: https://lore.kernel.org/r/20191219134847.413582-1-leon@kernel.org Signed-off-by: Eugene Crosser <evgenii.cherkashin@profitbricks.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/qedr: Add kernel capability flags for dpm enabled modeMichal Kalderon
HW/FW support two types of latency enhancement features. Until now user-space implemented only edpm (enhanced dpm). We add kernel capability flags to differentiate between current FW in kernel that supports both ldpm and edpm. Since edpm is not yet supported for iWARP we add different flags for iWARP + RoCE. We also fix bad practice of defining sizes in rdma-core and pass initialization to kernel, for forward compatibility. The capability flags are added for backward-forward compatibility between kernel and rdma-core for qedr. Before this change there was a field called dpm_enabled which could hold either 0 or 1 value, this indicated whether RoCE edpm was enabled or not. We modified this field to be dpm_flags, and bit 1 still holds the same meaning of RoCE edpm being enabled or not. Link: https://lore.kernel.org/r/20191121112957.25162-1-michal.kalderon@marvell.com Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-25ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()Florian Fainelli
This reverts commit 6bb86fefa086faba7b60bb452300b76a47cde1a5 ("libahci_platform: Staticize ahci_platform_<en/dis>able_phys()") we are going to need ahci_platform_{enable,disable}_phys() in a subsequent commit for ahci_brcm.c in order to properly control the PHY initialization order. Also make sure the function prototypes are declared in include/linux/ahci_platform.h as a result. Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25libata: Fix retrieving of active qcsSascha Hauer
ata_qc_complete_multiple() is called with a mask of the still active tags. mv_sata doesn't have this information directly and instead calculates the still active tags from the started tags (ap->qc_active) and the finished tags as (ap->qc_active ^ done_mask) Since 28361c40368 the hw_tag and tag are no longer the same and the equation is no longer valid. In ata_exec_internal_sg() ap->qc_active is initialized as 1ULL << ATA_TAG_INTERNAL, but in hardware tag 0 is started and this will be in done_mask on completion. ap->qc_active ^ done_mask becomes 0x100000000 ^ 0x1 = 0x100000001 and thus tag 0 used as the internal tag will never be reported as completed. This is fixed by introducing ata_qc_get_active() which returns the active hardware tags and calling it where appropriate. This is tested on mv_sata, but sata_fsl and sata_nv suffer from the same problem. There is another case in sata_nv that most likely needs fixing as well, but this looks a little different, so I wasn't confident enough to change that. Fixes: 28361c403683 ("libata: add extra internal command") Cc: stable@vger.kernel.org Tested-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Add missing export of ata_qc_get_active(), as per Pali. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-22Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Ext4 bug fixes, including a regression fix" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: clarify impact of 'commit' mount option ext4: fix unused-but-set-variable warning in ext4_add_entry() jbd2: fix kernel-doc notation warning ext4: use RCU API in debug_print_tree ext4: validate the debug_want_extra_isize mount option at parse time ext4: reserve revoke credits in __ext4_new_inode ext4: unlock on error in ext4_expand_extra_isize() ext4: optimize __ext4_check_dir_entry() ext4: check for directory entries too close to block end ext4: fix ext4_empty_dir() for directories with holes
2019-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso, including adding a missing ipv6 match description. 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi Bhat. 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet. 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold. 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien. 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul Chaignon. 7) Multicast MAC limit test is off by one in qede, from Manish Chopra. 8) Fix established socket lookup race when socket goes from TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening RCU grace period. From Eric Dumazet. 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet. 10) Fix active backup transition after link failure in bonding, from Mahesh Bandewar. 11) Avoid zero sized hash table in gtp driver, from Taehee Yoo. 12) Fix wrong interface passed to ->mac_link_up(), from Russell King. 13) Fix DSA egress flooding settings in b53, from Florian Fainelli. 14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost. 15) Fix double free in dpaa2-ptp code, from Ioana Ciornei. 16) Reject invalid MTU values in stmmac, from Jose Abreu. 17) Fix refcount leak in error path of u32 classifier, from Davide Caratti. 18) Fix regression causing iwlwifi firmware crashes on boot, from Anders Kaseorg. 19) Fix inverted return value logic in llc2 code, from Chan Shu Tak. 20) Disable hardware GRO when XDP is attached to qede, frm Manish Chopra. 21) Since we encode state in the low pointer bits, dst metrics must be at least 4 byte aligned, which is not necessarily true on m68k. Add annotations to fix this, from Geert Uytterhoeven. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits) sfc: Include XDP packet headroom in buffer step size. sfc: fix channel allocation with brute force net: dst: Force 4-byte alignment of dst_metrics selftests: pmtu: fix init mtu value in description hv_netvsc: Fix unwanted rx_table reset net: phy: ensure that phy IDs are correctly typed mod_devicetable: fix PHY module format qede: Disable hardware gro when xdp prog is installed net: ena: fix issues in setting interrupt moderation params in ethtool net: ena: fix default tx interrupt moderation interval net/smc: unregister ib devices in reboot_event net: stmmac: platform: Fix MDIO init for platforms without PHY llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) net: hisilicon: Fix a BUG trigered by wrong bytes_compl net: dsa: ksz: use common define for tag len s390/qeth: don't return -ENOTSUPP to userspace s390/qeth: fix promiscuous mode after reset s390/qeth: handle error due to unsupported transport mode cxgb4: fix refcount init for TC-MQPRIO offload tc-testing: initial tdc selftests for cls_u32 ...
2019-12-21Merge tag 'for-linus-5.5b-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains two cleanup patches and a small series for supporting reloading the Xen block backend driver" * tag 'for-linus-5.5b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/grant-table: remove multiple BUG_ON on gnttab_interface xen-blkback: support dynamic unbind/bind xen/interface: re-define FRONT/BACK_RING_ATTACH() xenbus: limit when state is forced to closed xenbus: move xenbus_dev_shutdown() into frontend code... xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
2019-12-20net: dst: Force 4-byte alignment of dst_metricsGeert Uytterhoeven
When storing a pointer to a dst_metrics structure in dst_entry._metrics, two flags are added in the least significant bits of the pointer value. Hence this assumes all pointers to dst_metrics structures have at least 4-byte alignment. However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not 4 bytes. Hence in some kernel builds, dst_default_metrics may be only 2-byte aligned, leading to obscure boot warnings like: WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261 Stack from 10835e6c: 10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea 00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000 04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c 00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001 003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84 003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a Call Trace: [<00023fa6>] __warn+0xb2/0xb4 [<00023fea>] warn_slowpath_fmt+0x42/0x64 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<00043aa8>] printk+0x0/0x18 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e [<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e [<00025e84>] __local_bh_enable_ip+0x0/0x8e [<002416a8>] dst_destroy+0x40/0xae Fix this by forcing 4-byte alignment of all dst_metrics structures. Fixes: e5fd387ad5b30ca3 ("ipv6: do not overwrite inetpeer metrics prematurely") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: phy: ensure that phy IDs are correctly typedRussell King
PHY IDs are 32-bit unsigned quantities. Ensure that they are always treated as such, and not passed around as "int"s. Fixes: 13d0ab6750b2 ("net: phy: check return code when requesting PHY driver module") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20mod_devicetable: fix PHY module formatRussell King
When a PHY is probed, if the top bit is set, we end up requesting a module with the string "mdio:-10101110000000100101000101010001" - the top bit is printed to a signed -1 value. This leads to the module not being loaded. Fix the module format string and the macro generating the values for it to ensure that we only print unsigned types and the top bit is always 0/1. We correctly end up with "mdio:10101110000000100101000101010001". Fixes: 8626d3b43280 ("phylib: Support phy module autoloading") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20xen/interface: re-define FRONT/BACK_RING_ATTACH()Paul Durrant
Currently these macros are defined to re-initialize a front/back ring (respectively) to values read from the shared ring in such a way that any requests/responses that are added to the shared ring whilst the front/back is detached will be skipped over. This, in general, is not a desirable semantic since most frontend implementations will eventually block waiting for a response which would either never appear or never be processed. Since the macros are currently unused, take this opportunity to re-define them to re-initialize a front/back ring using specified values. This also allows FRONT/BACK_RING_INIT() to be re-defined in terms of FRONT/BACK_RING_ATTACH() using a specified value of 0. NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-20xenbus: limit when state is forced to closedPaul Durrant
If a driver probe() fails then leave the xenstore state alone. There is no reason to modify it as the failure may be due to transient resource allocation issues and hence a subsequent probe() may succeed. If the driver supports re-binding then only force state to closed during remove() only in the case when the toolstack may need to clean up. This can be detected by checking whether the state in xenstore has been set to closing prior to device removal. NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver, which defaults to false. Subsequent patches will add support to some backend drivers. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-19of: mdio: export of_mdiobus_child_is_phyAntoine Tenart
This patch exports of_mdiobus_child_is_phy, allowing to check if a child node is a network PHY. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-12-19 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 269 insertions(+), 108 deletions(-). The main changes are: 1) Fix lack of synchronization between xsk wakeup and destroying resources used by xsk wakeup, from Maxim Mikityanskiy. 2) Fix pruning with tail call patching, untrack programs in case of verifier error and fix a cgroup local storage tracking bug, from Daniel Borkmann. 3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer. 4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when only cBPF is present, from Alexander Lobakin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib/Kconfig.debug: fix some messed up configurations mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG kasan: don't assume percpu shadow allocations will succeed kasan: use apply_to_existing_page_range() for releasing vmalloc shadow mm/memory.c: add apply_to_existing_page_range() helper kasan: fix crashes on access to memory mapped by vm_map_ram()
2019-12-19Merge tag 'pm-5.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a problem related to CPU offline/online and cpufreq governors that in some system configurations may lead to a system-wide deadlock during CPU online" * tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid leaving stale IRQ work items during CPU offline
2019-12-19Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: cpufreq: Avoid leaving stale IRQ work items during CPU offline
2019-12-18Merge tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmddLinus Torvalds
Pull tpm fixes from Jarkko Sakkinen: "Bunch of fixes for rc3" * tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm/tpm_ftpm_tee: add shutdown call back tpm: selftest: cleanup after unseal with wrong auth/policy test tpm: selftest: add test covering async mode tpm: fix invalid locking in NONBLOCKING mode security: keys: trusted: fix lost handle flush tpm_tis: reserve chip for duration of tpm_tis_core_init KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails KEYS: remove CONFIG_KEYS_COMPAT
2019-12-18Merge tag 'sound-5.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A slightly high amount at this time, but all good and small fixes: - A PCM core fix that initializes the buffer properly for avoiding information leaks; it is a long-standing minor problem, but good to fix better now - A few ASoC core fixes for the init / cleanup ordering issues that surfaced after the recent refactoring - Lots of SOF and topology-related fixes went in, as usual as such hot topics - Several ASoC codec and platform-specific small fixes: wm89xx, realtek, and max98090, AMD, Intel-SST - A fix for the previous incomplete regression of HD-audio, now hitting Nvidia HDMI - A few HD-audio CA0132 codec fixes" * tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits) ALSA: hda - Downgrade error message for single-cmd fallback ASoC: wm8962: fix lambda value ALSA: hda: Fix regression by strip mask fix ALSA: hda/ca0132 - Fix work handling in delayed HP detection ALSA: hda/ca0132 - Avoid endless loop ALSA: hda/ca0132 - Keep power on during processing DSP response ALSA: pcm: Avoid possible info leaks from PCM stream buffers ASoC: Intel: common: work-around incorrect ACPI HID for CML boards ASoC: SOF: Intel: split cht and byt debug window sizes ASoC: SOF: loader: fix snd_sof_fw_parse_ext_data ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header ASoC: simple-card: Don't create separate link when platform is present ASoC: topology: Check return value for soc_tplg_pcm_create() ASoC: topology: Check return value for snd_soc_add_dai_link() ASoC: core: only flush inited work during free ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89 ASoC: core: Init pcm runtime work early to avoid warnings ASoC: Intel: sst: Add missing include <linux/io.h> ASoC: max98090: fix possible race conditions ASoC: max98090: exit workaround earlier if PLL is locked ...
2019-12-17net: fix kernel-doc warning in <linux/netdevice.h>Randy Dunlap
Fix missing '*' kernel-doc notation that causes this warning: ../include/linux/netdevice.h:1779: warning: bad line: spinlock Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>