linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-10-15	drm/i915: Enable multi-bb execbuf	Matthew Brost
	Enable multi-bb execbuf by enabling the set_parallel extension. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-25-matthew.brost@intel.com
2021-10-15	drm/i915: Update I915_GEM_BUSY IOCTL to understand composite fences	Matthew Brost
	Parallel submission create composite fences (dma_fence_array) for excl / shared slots in objects. The I915_GEM_BUSY IOCTL checks these slots to determine the busyness of the object. Prior to patch it only check if the fence in the slot was a i915_request. Update the check to understand composite fences and correctly report the busyness. v2: (Tvrtko) - Remove duplicate BUILD_BUG_ON Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-24-matthew.brost@intel.com
2021-10-15	drm/i915: Make request conflict tracking understand parallel submits	Matthew Brost
	If an object in the excl or shared slot is a composite fence from a parallel submit and the current request in the conflict tracking is from the same parallel context there is no need to enforce ordering as the ordering is already implicit. Make the request conflict tracking understand this by comparing a parallel submit's parent context and skipping conflict insertion if the values match. v2: (John Harrison) - Reword commit message Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Handle errors in multi-lrc requests	Matthew Brost
	If an error occurs in the front end when multi-lrc requests are getting generated we need to skip these in the backend but we still need to emit the breadcrumbs seqno. An issues arises because with multi-lrc breadcrumbs there is a handshake between the parent and children to make forward progress. If all the requests are not present this handshake doesn't work. To work around this, if multi-lrc request has an error we skip the handshake but still emit the breadcrumbs seqno. v2: (John Harrison) - Add comment explaining the skipping of the handshake logic - Fix typos in the commit message v3: (John Harrison) - Fix up some comments about the math to NOP the ring Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com
2021-10-15	drm/i915: Multi-BB execbuf	Matthew Brost
	Allow multiple batch buffers to be submitted in a single execbuf IOCTL after a context has been configured with the 'set_parallel' extension. The number batches is implicit based on the contexts configuration. This is implemented with a series of loops. First a loop is used to find all the batches, a loop to pin all the HW contexts, a loop to create all the requests, a loop to submit (emit BB start, etc...) all the requests, a loop to tie the requests to the VMAs they touch, and finally a loop to commit the requests to the backend. A composite fence is also created for the generated requests to return to the user and to stick in dma resv slots. No behavior from the existing IOCTL should be changed aside from when throttling because the ring for a context is full. In this situation, i915 will now wait while holding the object locks. This change was done because the code is much simpler to wait while holding the locks and we believe there isn't a huge benefit of dropping these locks. If this proves false we can restructure the code to drop the locks during the wait. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Matthew Brost) - Return proper error value if i915_request_create fails v3: (John Harrison) - Add comment explaining create / add order loops + locking - Update commit message explaining different in IOCTL behavior - Line wrap some comments - eb_add_request returns void - Return -EINVAL rather triggering BUG_ON if cmd parser used (Checkpatch) - Check eb->batch_len[*current_batch] v4: (CI) - Set batch len if passed if via execbuf args - Call __i915_request_skip after __i915_request_commit (Kernel test robot) - Initialize rq to NULL in eb_pin_timeline v5: (John Harrison) - Fix typo in comments near bb order loops Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Implement no mid batch preemption for multi-lrc	Matthew Brost
	For some users of multi-lrc, e.g. split frame, it isn't safe to preempt mid BB. To safely enable preemption at the BB boundary, a handshake between parent and child is needed, syncing the set of BBs at the beginning and end of each batch. This is implemented via custom emit_bb_start & emit_fini_breadcrumb functions and enabled by default if a context is configured by set parallel extension. Lastly, this patch updates the process descriptor to the correct size as the memory used in the handshake is directly after the process descriptor. v2: (John Harrison) - Fix a few comments wording - Add struture for parent page layout v3: (John Harrison) - A structure for sync semaphore - Use offsetof to calc address - Update commit message v4: (John Harrison) - Fix typos in comment explaining memory map of scratch page Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-20-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Add basic GuC multi-lrc selftest	Matthew Brost
	Add very basic (single submission) multi-lrc selftest. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-19-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Connect UAPI to GuC multi-lrc interface	Matthew Brost
	Introduce 'set parallel submit' extension to connect UAPI to GuC multi-lrc interface. Kernel doc in new uAPI should explain it all. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Daniel Vetter) - Add IGT link and placeholder for media UMD link v3: (Kernel test robot) - Fix warning in unpin engines call (John Harrison) - Reword a bunch of the kernel doc v4: (John Harrison) - Add comment why perma-pin is done after setting gem context - Update some comments / docs for proto contexts v5: (John Harrison) - Rework perma-pin comment - Add BUG_IN if context is pinned when setting gem context Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-17-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Update debugfs for GuC multi-lrc	Matthew Brost
	Display the workqueue status in debugfs for GuC contexts that are in parent-child relationship. v2: (John Harrison) - Output number children in debugfs Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-16-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Implement multi-lrc reset	Matthew Brost
	Update context and full GPU reset to work with multi-lrc. The idea is parent context tracks all the active requests inflight for itself and its children. The parent context owns the reset replaying / canceling requests as needed. v2: (John Harrison) - Simply loop in find active request - Add comments to find ative request / reset loop v3: (John Harrison) - s/its'/its/g - Fix comment when searching for active request - Reorder if state in __guc_reset_context v4: (Kernel test robot) - Delete unused is_multi_lrc function Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-15-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Insert submit fences between requests in parent-child relationship	Matthew Brost
	The GuC must receive requests in the order submitted for contexts in a parent-child relationship to function correctly. To ensure this, insert a submit fence between the current request and last request submitted for requests / contexts in a parent child relationship. This is conceptually similar to a single timeline. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Implement multi-lrc submission	Matthew Brost
	Implement multi-lrc submission via a single workqueue entry and single H2G. The workqueue entry contains an updated tail value for each request, of all the contexts in the multi-lrc submission, and updates these values simultaneously. As such, the tasklet and bypass path have been updated to coalesce requests into a single submission. v2: (John Harrison) - s/wqe/wqi - Use FIELD_PREP macros - Add GEM_BUG_ONs ensures length fits within field - Add comment / white space to intel_guc_write_barrier (Kernel test robot) - Make need_tasklet a static function v3: (Docs) - A comment for submission_stall_reason v4: (Kernel test robot) - Initialize return value in bypass tasklt submit function (John Harrison) - Add comment near work queue defs - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2 - Update write_barrier comment to talk about work queue v5: (John Harrison) - Fix typo in work queue comment Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Implement parallel context pin / unpin functions	Matthew Brost
	Parallel contexts are perma-pinned by the upper layers which makes the backend implementation rather simple. The parent pins the guc_id and children increment the parent's pin count on pin to ensure all the contexts are unpinned before we disable scheduling with the GuC / or deregister the context. v2: (Daniel Vetter) - Perma-pin parallel contexts Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-12-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids	Matthew Brost
	Assign contexts in parent-child relationship consecutive guc_ids. This is accomplished by partitioning guc_id space between ones that need to be consecutive (1/16 available guc_ids) and ones that do not (15/16 of available guc_ids). The consecutive search is implemented via the bitmap API. This is a precursor to the full GuC multi-lrc implementation but aligns to how GuC mutli-lrc interface is defined - guc_ids must be consecutive when using the GuC multi-lrc interface. v2: (Daniel Vetter) - Explicitly state why we assign consecutive guc_ids v3: (John Harrison) - Bring back in spin lock Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-11-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts	Matthew Brost
	In GuC parent-child contexts the parent context controls the scheduling, ensure only the parent does the scheduling operations. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-10-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Add multi-lrc context registration	Matthew Brost
	Add multi-lrc context registration H2G. In addition a workqueue and process descriptor are setup during multi-lrc context registration as these data structures are needed for multi-lrc submission. v2: (John Harrison) - Move GuC specific fields into sub-struct - Clean up WQ defines - Add comment explaining math to derive WQ / PD address v3: (John Harrison) - Add PARENT_SCRATCH_SIZE define - Update comment explaining multi-lrc register v4: (John Harrison) - Move PARENT_SCRATCH_SIZE to common file Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-9-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Introduce context parent-child relationship	Matthew Brost
	Introduce context parent-child relationship. Once this relationship is created all pinning / unpinning operations are directed to the parent context. The parent context is responsible for pinning all of its children and itself. This is a precursor to the full GuC multi-lrc implementation but aligns to how GuC mutli-lrc interface is defined - a single H2G is used register / deregister all of the contexts simultaneously. Subsequent patches in the series will implement the pinning / unpinning operations for parent / child contexts. v2: (Daniel Vetter) - Add kernel doc, add wrapper to access parent to ensure safety v3: (John Harrison) - Fix comment explaing GEM_BUG_ON in to_parent() - Make variable names generic (non-GuC specific) v4: (John Harrison) - s/its'/its/g Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-8-matthew.brost@intel.com
2021-10-15	drm/i915: Expose logical engine instance to user	Matthew Brost
	Expose logical engine instance to user via query engine info IOCTL. This is required for split-frame workloads as these needs to be placed on engines in a logically contiguous order. The logical mapping can change based on fusing. Rather than having user have knowledge of the fusing we simply just expose the logical mapping with the existing query engine info IOCTL. IGT: https://patchwork.freedesktop.org/patch/445637/?series=92854&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Daniel Vetter) - Add IGT link, placeholder for media UMD Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-7-matthew.brost@intel.com
2021-10-15	drm/i915: Add logical engine mapping	Matthew Brost
	Add logical engine mapping. This is required for split-frame, as workloads need to be placed on engines in a logically contiguous manner. v2: (Daniel Vetter) - Add kernel doc for new fields v3: (Tvrtko) - Update comment for new logical_mask field v4: (John Harrison) - Update comment for new logical_mask field Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-6-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Don't call switch_to_kernel_context with GuC submission	Matthew Brost
	Calling switch_to_kernel_context isn't needed if the engine PM reference is taken while all user contexts are pinned as if don't have PM ref that guarantees that all user contexts scheduling is disabled. By not calling switch_to_kernel_context we save on issuing a request to the engine. v2: (Daniel Vetter) - Add FIXME comment about pushing switch_to_kernel_context to backend v3: (John Harrison) - Update commit message - Fix workding comment Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-5-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Take engine PM when a context is pinned with GuC submission	Matthew Brost
	Taking a PM reference to prevent intel_gt_wait_for_idle from short circuiting while any user context has scheduling enabled. Returning GT idle when it is not can cause all sorts of issues throughout the stack. v2: (Daniel Vetter) - Add might_lock annotations to pin / unpin function v3: (CI) - Drop intel_engine_pm_might_put from unpin path as an async put is used v4: (John Harrison) - Make intel_engine_pm_might_get/put work with GuC virtual engines - Update commit message v5: - Update commit message again Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-4-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Take GT PM ref when deregistering context	Matthew Brost
	Taking a PM reference to prevent intel_gt_wait_for_idle from short circuiting while a deregister context H2G is in flight. To do this must issue the deregister H2G from a worker as context can be destroyed from an atomic context and taking GT PM ref blows up. Previously we took a runtime PM from this atomic context which worked but will stop working once runtime pm autosuspend in enabled. So this patch is two fold, stop intel_gt_wait_for_idle from short circuting and fix runtime pm autosuspend. v2: (John Harrison) - Split structure changes out in different patch (Tvrtko) - Don't drop lock in deregister_destroyed_contexts v3: (John Harrison) - Flush destroyed contexts before destroying context reg pool Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-3-matthew.brost@intel.com
2021-10-15	drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct	Matthew Brost
	Move guc_id allocation under submission state sub-struct as a future patch will reuse the spin lock as a global submission state lock. Moving this into sub-struct makes ownership of fields / lock clear. v2: (Docs) - Add comment for submission_state sub-structure v3: (John Harrison) - Fixup a few comments v4: (John Harrison) - Fix typo Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-2-matthew.brost@intel.com
2021-10-15	xen/pcifront: Use to_pci_driver() instead of pci_dev->driver	Uwe Kleine-König
	Struct pci_driver contains a struct device_driver, so for PCI devices, it's easy to convert a device_driver * to a pci_driver * with to_pci_driver(). The device_driver * is in struct device, so we don't need to also keep track of the pci_driver * in struct pci_dev. Replace pdev->driver with to_pci_driver(). This is a step toward removing pci_dev->driver. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/20211004125935.2300113-11-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-10-15	xen/pcifront: Drop pcifront_common_process() tests of pcidev, pdrv	Uwe Kleine-König
	pcifront_common_process() exits early if pcidev or pcidev->driver are NULL, so simplify it by not checking them again. [bhelgaas: split flag change to separate patch] Link: https://lore.kernel.org/r/20211004125935.2300113-4-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-10-15	rpmsg: glink: Send READ_NOTIFY command in FIFO full case	Arun Kumar Neelakantam
	The current design sleeps unconditionally in TX FIFO full case and wakeup only after sleep timer expires which adds random delays in clients TX path. Avoid sleep and use READ_NOTIFY command so that writer can be woken up when remote notifies about read completion by sending IRQ. Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org> Signed-off-by: Arun Kumar Neelakantam <aneela@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/1596086296-28529-7-git-send-email-deesin@codeaurora.org
2021-10-15	rpmsg: glink: Remove channel decouple from rpdev release	Chris Lew
	If a channel is being rapidly restarting and the kobj release worker is busy, there is a chance the rpdev_release function will run after the channel struct itself has been released. There should not be a need to decouple the channel from rpdev in the rpdev release since that should only happen from the close commands. Signed-off-by: Chris Lew <clew@codeaurora.org> Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/1596086296-28529-6-git-send-email-deesin@codeaurora.org
2021-10-15	rpmsg: glink: Remove the rpmsg dev in close_ack	Arun Kumar Neelakantam
	Un-register and register of rpmsg driver is sending invalid open_ack on closed channel. To avoid sending invalid open_ack case unregister the rpmsg device after receiving the local_close_ack from remote side. Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org> Signed-off-by: Arun Kumar Neelakantam <aneela@codeaurora.org> [bjorn: s/strlcpy/strscpy/] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/1596086296-28529-5-git-send-email-deesin@codeaurora.org
2021-10-15	rpmsg: glink: Add TX_DATA_CONT command while sending	Arun Kumar Neelakantam
	With current design the transport can send packets of size upto FIFO_SIZE which is 16k and return failure for all packets above 16k. Add TX_DATA_CONT command to send packets greater than 16k by splitting into 8K chunks. Signed-off-by: Arun Kumar Neelakantam <aneela@codeaurora.org> Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/1596086296-28529-4-git-send-email-deesin@codeaurora.org
2021-10-15	spi: replace snprintf in show functions with sysfs_emit	Qing Wang
	show() must not use snprintf() when formatting the value to be returned to user space. Fix the following coccicheck warning: drivers/spi/spi-tle62x0.c:144: WARNING: use scnprintf or sprintf. Use sysfs_emit instead of scnprintf or sprintf makes more sense. Signed-off-by: Qing Wang <wangqing@vivo.com> Link: https://lore.kernel.org/r/1634280668-4954-1-git-send-email-wangqing@vivo.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-15	spi: cadence: Add of_node_put() before return	Wan Jiabing
	Fix following coccicheck warning: ./drivers/spi/spi-cadence-xspi.c:490:1-23: WARNING: Function for_each_child_of_node should have of_node_put() before return Early exits from for_each_child_of_node should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20211015033919.5915-1-wanjiabing@vivo.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-15	spi: orion: Add of_node_put() before goto	Wan Jiabing
	Fix following coccicheck warning: ./drivers/spi/spi-orion.c:738:1-33: WARNING: Function for_each_available_child_of_node should have of_node_put() before goto Early exits from for_each_available_child_of_node should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20211015034008.6357-1-wanjiabing@vivo.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-15	remoteproc: Remove vdev_to_rvdev and vdev_to_rproc from remoteproc API	Arnaud Pouliquen
	These both functions are only used by the remoteproc_virtio. There is no reason to expose them in the API. Move the functions in remoteproc_virtio.c Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211001101234.4247-4-arnaud.pouliquen@foss.st.com
2021-10-15	remoteproc: omap_remoteproc: simplify getting .driver_data	Wolfram Sang
	We should get 'driver_data' from 'struct device' directly. Going via platform_device is an unneeded step back and forth. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210920090522.23784-10-wsa+renesas@sang-engineering.com
2021-10-15	remoteproc: qcom_q6v5_mss: Use devm_platform_ioremap_resource_byname() to ↵	zhaoxiao
	simplify code In this function, devm_platform_ioremap_resource_byname() should be suitable to simplify code. Signed-off-by: zhaoxiao <long870912@gmail.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210906071147.9095-1-long870912@gmail.com
2021-10-15	remoteproc: Fix a memory leak in an error handling path in 'rproc_handle_vdev()'	Christophe JAILLET
	If 'copy_dma_range_map() fails, the memory allocated for 'rvdev' will leak. Move the 'copy_dma_range_map()' call after the device registration so that 'rproc_rvdev_release()' can be called to free some resources. Also, branch to the error handling path if 'copy_dma_range_map()' instead of a direct return to avoid some other leaks. Fixes: e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jim Quinlan <james.quinlan@broadcom.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/e6d0dad6620da4fdf847faa903f79b735d35f262.1630755377.git.christophe.jaillet@wanadoo.fr
2021-10-15	remoteproc: Fix spelling mistake "atleast" -> "at least"	Colin Ian King
	There are spelling mistakes dev_err messages. Fix them. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210826123735.14650-1-colin.king@canonical.com
2021-10-15	ice: make use of ice_for_each_* macros	Maciej Fijalkowski
	Go through the code base and use ice_for_each_* macros. While at it, introduce ice_for_each_xdp_txq() macro that can be used for looping over xdp_rings array. Commit is not introducing any new functionality. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: introduce XDP_TX fallback path	Maciej Fijalkowski
	Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: optimize XDP_TX workloads	Maciej Fijalkowski
	Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled. Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively. Note that at this point the threshold is a constant (32), but it is something that we could make configurable. First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32. Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit. Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring. With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors. This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows. Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip> As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32 with tx_busy staying calm. When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64 it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: propagate xdp_ring onto rx_ring	Maciej Fijalkowski
	With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame. Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI. Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial. Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup. [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/ Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: do not create xdp_frame on XDP_TX	Maciej Fijalkowski
	xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: unify xdp_rings accesses	Maciej Fijalkowski
	There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: split ice_ring onto Tx/Rx separate structs	Maciej Fijalkowski
	While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: move ice_container_type onto ice_ring_container	Maciej Fijalkowski
	Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	ice: remove ring_active from ice_ring	Maciej Fijalkowski
	This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15	Merge tag 'gpio-fixes-for-v5.15-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix module autoloading on gpio-74x164 after a revert of OF modaliases - fix problems with the bias setting in gpio-pca953x - fix a use-after-free bug in gpio-mockup by using software nodes * tag 'gpio-fixes-for-v5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: mockup: Convert to use software nodes gpio: pca953x: Improve bias setting gpio: 74x164: Add SPI device ID table
2021-10-15	Merge tag 'spi-fix-v5.15-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small fixes. Mostly driver specific but there's one in the core which fixes a deadlock when adding devices on spi-mux that's triggered because spi-mux is a SPI device which is itself a SPI controller and so can instantiate devices when registered. We were using a global lock to protect against reusing chip selects but they're a per controller thing so moving the lock per controller resolves that" * tag 'spi-fix-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi-mux: Fix false-positive lockdep splats spi: Fix deadlock when adding SPI controllers on SPI buses spi: bcm-qspi: clear MSPI spifie interrupt during probe spi: spi-nxp-fspi: don't depend on a specific node name erratum workaround spi: mediatek: skip delays if they are 0 spi: atmel: Fix PDC transfer setup bug spi: spidev: Add SPI ID table spi: Use 'flash' node name instead of 'spi-flash' in example
2021-10-15	Merge tag 'mtd/fixes-for-5.15-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fix from Miquel Raynal: "Raw NAND controller driver fix: - Qcom: Update code word value for raw reads (QPIC v2+)" * tag 'mtd/fixes-for-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: qcom: Update code word value for raw read
2021-10-15	Merge tag 'drm-fixes-2021-10-15-1' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "It has a few scattered msm and i915 fixes, a few core fixes and a mediatek feature revert. I've had to pick a bunch of patches into this, as the drm-misc-fixes tree had a bunch of vc4 patches I wasn't comfortable with sending to you at least as part of this, they were delayed due to your reverts. If it's really useful as fixes I'll do a separate pull. Summary: Core: - clamp fbdev size - edid cap blocks read to avoid out of bounds panel: - fix missing crc32 dependency msm: - Fix a new crash on dev file close if the dev file was opened when GPU is not loaded (such as missing fw in initrd) - Switch to single drm_sched_entity per priority level per drm_file to unbreak multi-context userspace - Serialize GMU access to fix GMU OOB errors - Various error path fixes - A couple integer overflow fixes - Fix mdp5 cursor plane WARNs i915: - Fix ACPI object leak - Fix context leak in user proto-context creation - Fix missing i915_sw_fence_fini call hyperv: - hide hw pointer nouveau: - fix engine selection bit r128: - fix UML build rcar-du: - unconncted LVDS regression fix mediatek: - revert CMDQ refinement patches" * tag 'drm-fixes-2021-10-15-1' of git://anongit.freedesktop.org/drm/drm: (34 commits) drm/panel: olimex-lcd-olinuxino: select CRC32 drm/r128: fix build for UML drm/nouveau/fifo: Reinstate the correct engine bit programming drm/hyperv: Fix double mouse pointers drm/fbdev: Clamp fbdev surface size if too large drm/edid: In connector_bad_edid() cap num_of_ext by num_blocks read drm/i915: Free the returned object of acpi_evaluate_dsm() drm/i915: Fix bug in user proto-context creation that leaked contexts drm: rcar-du: Don't create encoder for unconnected LVDS outputs drm/msm/dsi: fix off by one in dsi_bus_clk_enable error handling drm/msm/dsi: Fix an error code in msm_dsi_modeset_init() drm/msm/dsi: dsi_phy_14nm: Take ready-bit into account in poll_for_ready drm/msm/dsi/phy: fix clock names in 28nm_8960 phy drm/msm/dpu: Fix address of SM8150 PINGPONG5 IRQ register drm/msm: Do not run snapshot on non-DPU devices drm/msm/a3xx: fix error handling in a3xx_gpu_init() drm/msm/a4xx: fix error handling in a4xx_gpu_init() drm/msm: Fix null pointer dereference on pointer edp drm/msm/mdp5: fix cursor-related warnings drm/msm: Avoid potential overflow in timeout_to_jiffies() ...