Age | Commit message (Collapse) | Author |
|
Add SVM init / close / fini to faulting VMs. Minimual implementation
acting as a placeholder for follow on patches.
v2:
- Add close function
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Update chunk array to be unsigned long (Thomas)
- Use new drm_gpusvm.h header location (Thomas)
- Newlines between functions in xe_svm.h (Thomas)
- Call drm_gpusvm_driver_set_lock in init (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
v7:
- Only select CONFIG_DRM_GPUSVM if DEVICE_PRIVATE (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-10-matthew.brost@intel.com
|
|
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to
create unpopulated virtual memory areas (VMAs) without memory backing or
GPU page tables. These VMAs are referred to as CPU address mirror VMAs.
The idea is that upon a page fault or prefetch, the memory backing and
GPU page tables will be populated.
CPU address mirror VMAs only update GPUVM state; they do not have an
internal page table (PT) state, nor do they have GPU mappings.
It is expected that CPU address mirror VMAs will be mixed with buffer
object (BO) VMAs within a single VM. In other words, system allocations
and runtime allocations can be mixed within a single user-mode driver
(UMD) program.
Expected usage:
- Bind the entire virtual address (VA) space upon program load using the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- If a buffer object (BO) requires GPU mapping (runtime allocation),
allocate a CPU address using mmap(PROT_NONE), bind the BO to the
mmapped address using existing bind IOCTLs. If a CPU map of the BO is
needed, mmap it again to the same CPU address using mmap(MAP_FIXED)
- If a BO no longer requires GPU mapping, munmap it from the CPU address
space and them bind the mapping address with the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- Any malloc'd or mmapped CPU address accessed by the GPU will be
faulted in via the SVM implementation (system allocation).
- Upon freeing any mmapped or malloc'd data, the SVM implementation will
remove GPU mappings.
Only supporting 1 to 1 mapping between user address space and GPU
address space at the moment as that is the expected use case. uAPI
defines interface for non 1 to 1 but enforces 1 to 1, this restriction
can be lifted if use cases arrise for non 1 to 1 mappings.
This patch essentially short-circuits the code in the existing VM bind
paths to avoid populating page tables when the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.
v3:
- Call vm_bind_ioctl_ops_fini on -ENODATA
- Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs
- s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas)
- Rework commit message for expected usage (Thomas)
- Describe state of code after patch in commit message (Thomas)
v4:
- Fix alignment (Checkpatch)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
|
|
Xe depends on DRM_GPUSVM for SVM implementation, select it in Kconfig.
v6:
- Don't select DRM_GPUSVM if UML (CI)
v7:
- Only select DRM_GPUSVM if DEVICE_PRIVATE (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-8-matthew.brost@intel.com
|
|
This patch introduces support for GPU Shared Virtual Memory (SVM) in the
Direct Rendering Manager (DRM) subsystem. SVM allows for seamless
sharing of memory between the CPU and GPU, enhancing performance and
flexibility in GPU computing tasks.
The patch adds the necessary infrastructure for SVM, including data
structures and functions for managing SVM ranges and notifiers. It also
provides mechanisms for allocating, deallocating, and migrating memory
regions between system RAM and GPU VRAM.
This is largely inspired by GPUVM.
v2:
- Take order into account in check pages
- Clear range->pages in get pages error
- Drop setting dirty or accessed bit in get pages (Vetter)
- Remove mmap assert for cpu faults
- Drop mmap write lock abuse (Vetter, Christian)
- Decouple zdd from range (Vetter, Oak)
- Add drm_gpusvm_range_evict, make it work with coherent pages
- Export drm_gpusvm_evict_to_sram, only use in BO evict path (Vetter)
- mmget/put in drm_gpusvm_evict_to_sram
- Drop range->vram_alloation variable
- Don't return in drm_gpusvm_evict_to_sram until all pages detached
- Don't warn on mixing sram and device pages
- Update kernel doc
- Add coherent page support to get pages
- Use DMA_FROM_DEVICE rather than DMA_BIDIRECTIONAL
- Add struct drm_gpusvm_vram and ops (Thomas)
- Update the range's seqno if the range is valid (Thomas)
- Remove the is_unmapped check before hmm_range_fault (Thomas)
- Use drm_pagemap (Thomas)
- Drop kfree_mapping (Thomas)
- dma mapp pages under notifier lock (Thomas)
- Remove ctx.prefault
- Remove ctx.mmap_locked
- Add ctx.check_pages
- s/vram/devmem (Thomas)
v3:
- Fix memory leak drm_gpusvm_range_get_pages
- Only migrate pages with same zdd on CPU fault
- Loop over al VMAs in drm_gpusvm_range_evict
- Make GPUSVM a drm level module
- GPL or MIT license
- Update main kernel doc (Thomas)
- Prefer foo() vs foo for functions in kernel doc (Thomas)
- Prefer functions over macros (Thomas)
- Use unsigned long vs u64 for addresses (Thomas)
- Use standard interval_tree (Thomas)
- s/drm_gpusvm_migration_put_page/drm_gpusvm_migration_unlock_put_page (Thomas)
- Drop err_out label in drm_gpusvm_range_find_or_insert (Thomas)
- Fix kernel doc in drm_gpusvm_range_free_pages (Thomas)
- Newlines between functions defs in header file (Thomas)
- Drop shall language in driver vfunc kernel doc (Thomas)
- Move some static inlines from head to C file (Thomas)
- Don't allocate pages under page lock in drm_gpusvm_migrate_populate_ram_pfn (Thomas)
- Change check_pages to a threshold
v4:
- Fix NULL ptr deref in drm_gpusvm_migrate_populate_ram_pfn (Thomas, Himal)
- Fix check pages threshold
- Check for range being unmapped under notifier lock in get pages (Testing)
- Fix characters per line
- Drop WRITE_ONCE for zdd->devmem_allocation assignment (Thomas)
- Use completion for devmem_allocation->detached (Thomas)
- Make GPU SVM depend on ZONE_DEVICE (CI)
- Use hmm_range_fault for eviction (Thomas)
- Drop zdd worker (Thomas)
v5:
- Select Kconfig deps (CI)
- Set device to NULL in __drm_gpusvm_migrate_to_ram (Matt Auld, G.G.)
- Drop Thomas's SoB (Thomas)
- Add drm_gpusvm_range_start/end/size helpers (Thomas)
- Add drm_gpusvm_notifier_start/end/size helpers (Thomas)
- Absorb drm_pagemap name changes (Thomas)
- Fix driver lockdep assert (Thomas)
- Move driver lockdep assert to static function (Thomas)
- Assert mmap lock held in drm_gpusvm_migrate_to_devmem (Thomas)
- Do not retry forever on eviction (Thomas)
v6:
- Fix drm_gpusvm_get_devmem_page alignment (Checkpatch)
- Modify Kconfig (CI)
- Compile out lockdep asserts (CI)
v7:
- Add kernel doc for flags fields (CI, Auld)
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-7-matthew.brost@intel.com
|
|
Introduce xe_bo_put_async to put a bo where the context is such that
the bo destructor can't run due to lockdep problems or atomic context.
If the put is the final put, freeing will be done from a work item.
v5:
- Kerenl doc for xe_bo_put_async (Thomas)
v7:
- Fix kernel doc (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Tested-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-6-matthew.brost@intel.com
|
|
Introduce drm_pagemap ops to map and unmap dma to VRAM resources. In the
local memory case it's a matter of merely providing an offset into the
device's physical address. For future p2p the map and unmap functions may
encode as needed.
Similar to how dma-buf works, let the memory provider (drm_pagemap) provide
the mapping functionality.
v3:
- Move to drm level include
v4:
- Fix kernel doc (G.G.)
v5:
- s/map_dma/device_map (Thomas)
- s/unmap_dma/device_unmap (Thomas)
v7:
- Fix kernel doc (CI, Auld)
- Drop P2P define (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-5-matthew.brost@intel.com
|
|
Avoid multiple CPU page faults to the same device page racing by trying
to lock the page in do_swap_page before taking an extra reference to the
page. This prevents scenarios where multiple CPU page faults each take
an extra reference to a device page, which could abort migration in
folio_migrate_mapping. With the device page being locked in
do_swap_page, the migrate_vma_* functions need to be updated to avoid
locking the fault_page argument.
Prior to this change, a livelock scenario could occur in Xe's (Intel GPU
DRM driver) SVM implementation if enough threads faulted the same device
page.
v3:
- Put page after unlocking page (Alistair)
- Warn on spliting a TPH which is fault page (Alistair)
- Warn on dst page == fault page (Alistair)
v6:
- Add more verbose comment around THP (Alistair)
v7:
- Fix migrate_device_finalize alignment (Checkpatch)
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Tested-by: Alistair Popple <apopple@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-4-matthew.brost@intel.com
|
|
Add migrate_device_pfns which prepares an array of pre-populated device
pages for migration. This is needed for eviction of known set of
non-contiguous devices pages to cpu pages which is a common case for SVM
in DRM drivers using TTM.
v2:
- s/migrate_device_vma_range/migrate_device_prepopulated_range
- Drop extra mmu invalidation (Vetter)
v3:
- s/migrate_device_prepopulated_range/migrate_device_pfns (Alistar)
- Use helper to lock device pages (Alistar)
- Update commit message with why this is required (Alistar)
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-3-matthew.brost@intel.com
|
|
TTM doesn't support fair eviction via WW locking, this mitigated in by
using retry loops in exec and preempt rebind worker. Extend this retry
loop to BO allocation. Once TTM supports fair eviction this patch can be
reverted.
v4:
- Keep line break (Stuart)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-2-matthew.brost@intel.com
|
|
Use fault injection infrastructure to allow specific functions to
be configured over debugfs for failing during the execution of
xe_exec_queue_create_ioctl(). xe_exec_queue_destroy_ioctl() and
xe_exec_queue_get_property_ioctl() are not considered as there is
no unwinding code to test with fault injection.
This allows more thorough testing from user space by going through
code paths for error handling and unwinding which cannot be reached
by simply injecting errors in IOCTL arguments. This can help
increase code robustness.
The corresponding IGT series is:
https://patchwork.freedesktop.org/series/144138/
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250305150659.46276-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
|
|
Now that we have all IPs being described via struct xe_ip, where release
information (version and name) is represented in a single struct type,
we can extract duplicated logic from handle_pre_gmdid() and
handle_gmdid() and apply it in the body of xe_info_init().
With this change, there is no point in keeping handle_pre_gmdid()
anymore, so we just remove it and inline the assignment of
{graphics,media}_ip.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-7-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Now that pre-GMDID IPs are described via struct xe_ip, it is possible to
re-use the feature descriptors that have exact match with ones from
previous releases. Do that.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-6-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
We have now a struct xe_ip to fully describe an IP, but we are only
using that for GMDID-based IPs.
For pre-GMDID IPs, we still describe release info (version and name) via
feature descriptors (struct xe_{graphics,media}_desc). Let's convert
those to use struct xe_ip.
With this, we have a uniform way of describing IPs in the xe driver
instead of having different approaches based on whether the IPs use
GMDIDs or not.
A nice side-effect of this change is that now we have an easy way to
lookup, in the source code, mappings between versions, names and
features for all supported IPs.
v2:
- Store pointers to struct xe_ip instead xe_{graphics,media}_desc in
struct xe_device_desc.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-5-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
We will soon update the code so that pre-GMDID IPs are also defined with
struct xe_ip. Since we will need to refer to them in instances of
struct xe_device_desc, let's move up the current instances of xe_ip
(GMDID-based) so that all IP descriptors are kept together.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-4-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
If we pay closer attention to struct gmdid_map, we will realize that it
is actually fully describing an IP (graphics or media): it contains
"release info" and "features info". The former is comprised of fields
"ver" and "name"; and the latter is done via member "ip", which is a
pointer to either struct xe_graphics_desc or xe_media_desc, and can be
reused across releases.
As such let's:
* Rename struct gmdid_map to xe_ip.
* Rename the field ver to verx100 to be consistent with the naming of
members using that encoding of the version.
* Rename the field "ip" to "desc" to make it clear that it is a
pointer to a descriptor of features for the IP, since it will not
contain *all* info (i.e. features + release info).
We sill have release info mapped into struct xe_{graphics,media}_desc
for pre-GMDID IPs. In an upcoming change we will handle that so that we
make a clear separation between "release info" and "feature info".
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-3-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
The name of an IP is a function of its version. As such, given an IP
version, it should be clear to identify the name of that IP release.
With the current code, we keep that mapping clear for pre-GMDID IPs, but
ambiguous for GMDID-based ones. That causes two types of inconveniences:
1. The end user, who might not have all the necessary mapping at hand,
might be confused when seeing different possible IP names in the
dmesg log.
2. It makes a developer who is not familiar with the "IP version" to
"Release name" need to resort to looking at the specs to understand
see what version maps to what. While the specs should be the
authority on the mapping, we should make our lives easier by
reflecting that mapping in the source code.
Thus, since the IP name is tied to the version, let's remove the
ambiguity by using a "name" field in struct gmdid_map instead of
accumulating names in the descriptor instances.
This does result in the code having IP name being defined in
different structs (gmdid_map, xe_graphics_desc, xe_media_desc), but that
will be resolved in upcoming changes.
A side-effect of this change is that media_xe2 exactly matches
media_xelpmp now, so we just re-use the latter.
v2:
- Drop media_xe2 and re-use media_xelpmp. (Matt)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-2-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
In an upcoming change, we will handle setting graphics_name and
media_name differently for GMDID-based IPs. As such, let's make both
handle_pre_gmdid() and handle_gmdid() functions responsible for
initializing those fields. While now we have both doing essentially the
same thing with respect to those fields, handle_pre_gmdid() will diverge
soon.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-1-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
If userptr pages are freed after a call to the xe mmu notifier,
the device will not be blocked out from theoretically accessing
these pages unless they are also unmapped from the iommu, and
this violates some aspects of the iommu-imposed security.
Ensure that userptrs are unmapped in the mmu notifier to
mitigate this. A naive attempt would try to free the sg table, but
the sg table itself may be accessed by a concurrent bind
operation, so settle for only unmapping.
v3:
- Update lockdep asserts.
- Fix a typo (Matthew Auld)
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-4-thomas.hellstrom@linux.intel.com
|
|
The pnfs that we obtain from hmm_range_fault() point to pages that
we don't have a reference on, and the guarantee that they are still
in the cpu page-tables is that the notifier lock must be held and the
notifier seqno is still valid.
So while building the sg table and marking the pages accesses / dirty
we need to hold this lock with a validated seqno.
However, the lock is reclaim tainted which makes
sg_alloc_table_from_pages_segment() unusable, since it internally
allocates memory.
Instead build the sg-table manually. For the non-iommu case
this might lead to fewer coalesces, but if that's a problem it can
be fixed up later in the resource cursor code. For the iommu case,
the whole sg-table may still be coalesced to a single contigous
device va region.
This avoids marking pages that we don't own dirty and accessed, and
it also avoid dereferencing struct pages that we don't own.
v2:
- Use assert to check whether hmm pfns are valid (Matthew Auld)
- Take into account that large pages may cross range boundaries
(Matthew Auld)
v3:
- Don't unnecessarily check for a non-freed sg-table. (Matthew Auld)
- Add a missing up_read() in an error path. (Matthew Auld)
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-3-thomas.hellstrom@linux.intel.com
|
|
Add proper #ifndef around the xe_hmm.h header, proper spacing
and since the documentation mostly follows kerneldoc format,
make it kerneldoc. Also prepare for upcoming -stable fixes.
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Matthew Brost <Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-2-thomas.hellstrom@linux.intel.com
|
|
Concurrent VM bind staging and zapping of PTEs from a userptr notifier
do not work because the view of PTEs is not stable. VM binds cannot
acquire the notifier lock during staging, as memory allocations are
required. To resolve this race condition, use a staging tree for VM
binds that is committed only under the userptr notifier lock during the
final step of the bind. This ensures a consistent view of the PTEs in
the userptr notifier.
A follow up may only use staging for VM in fault mode as this is the
only mode in which the above race exists.
v3:
- Drop zap PTE change (Thomas)
- s/xe_pt_entry/xe_pt_entry_staging (Thomas)
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org>
Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
Fixes: a708f6501c69 ("drm/xe: Update PT layer with better error handling")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-5-thomas.hellstrom@linux.intel.com
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Fix fault mode invalidation racing with unbind leading to the
PTE zapping potentially traversing an invalid page-table tree.
Do this by holding the notifier lock across PTE zapping. This
might transfer any contention waiting on the notifier seqlock
read side to the notifier lock read side, but that shouldn't be
a major problem.
At the same time get rid of the open-coded invalidation in the bind
code by relying on the notifier even when the vma bind is not
yet committed.
Finally let userptr invalidation call a dedicated xe_vm function
performing a full invalidation.
Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-4-thomas.hellstrom@linux.intel.com
|
|
Fix a (harmless) misplaced #endif leading to declarations
appearing multiple times.
Fixes: 0eb2a18a8fad ("drm/xe: Implement VM snapshot support for BO's and userptr")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-3-thomas.hellstrom@linux.intel.com
|
|
If a userptr vma subject to prefetching was already invalidated
or invalidated during the prefetch operation, the operation would
repeatedly return -EAGAIN which would typically cause an infinite
loop.
Validate the userptr to ensure this doesn't happen.
v2:
- Don't fallthrough from UNMAP to PREFETCH (Matthew Brost)
Fixes: 5bd24e78829a ("drm/xe/vm: Subclass userptr vmas")
Fixes: 617eebb9c480 ("drm/xe: Fix array of binds")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-2-thomas.hellstrom@linux.intel.com
|
|
Allow user to provide a low latency hint. When set, KMD sends a hint
to GuC which results in special handling for that process. SLPC will
ramp the GT frequency aggressively every time it switches to this
process.
We need to enable the use of SLPC Compute strategy during init, but
it will apply only to processes that set this bit during process
creation.
Improvement with this approach as below:
Before,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Graphics [0xe20b]
Driver version : 24.52.0 (Linux x64)
Compute units : 160
Clock frequency : 2850 MHz
Kernel launch latency : 283.16 us
After,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Graphics [0xe20b]
Driver version : 24.52.0 (Linux x64)
Compute units : 160
Clock frequency : 2850 MHz
Kernel launch latency : 63.38 us
Compute PR: https://github.com/intel/compute-runtime/pull/794
Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214
IGT PR: https://patchwork.freedesktop.org/patch/639989/
V10(Lucas):
- Remove doc from drm-uapi.rst
v9(Vinay):
- remove extra line, align commit message
v8(Vinay):
- Add separate example for using low latency hint
v7(Jose):
- Update UMD PR
- applicable to all gpus
V6:
- init flags, remove redundant flags check (MAuld)
V5:
- Move uapi doc to documentation and GuC ABI specific change (Rodrigo)
- Modify logic to restrict exec queue flags (MAuld)
V4:
- To make it clear, dont use exec queue word (Vinay)
- Correct typo in description of flag (Jose/Vinay)
- rename set_strategy api and replace ctx with exec queue(Vinay)
- Start with 0th bit to indentify user flags (Jose)
V3:
- Conver user flag to kernel internal flag and use (Oak)
- Support query config for use to check kernel support (Jose)
- Dont need to take runtime pm (Vinay)
V2:
- DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon)
- Add motivation to description (Lucas)
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
Add a list of active tunings to debugfs, analogous to the existing
list of workarounds.
Rationale being that it seems to make sense to either have both or none.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-6-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
According to the i915 codebase xe missed to set the recommended
performance tuning for L3 hashing which is applicable to all legacy XeLP
platforms. Lets add it.
v2:
* Rename prefixes to XELP_.
* Tweak version end point.
v3:
* Add bspec tag.
* Tweak version range.
v4:
* Move from LRC to engine tunings list.
v5:
* Drop L3 Cache Control comment.
Bspec: 31870
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: c46c5fb725be ("drm/i915/gen12: Apply recommended L3 hashing mask")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-5-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
According to the i915 code base and as confirmed in the workaround
database, apart from setting the GS timer, all XeLP platforms should also
set the TDS timer.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: 2b5298b0aa09 ("drm/i915/gen12: Add recommended hardware tuning value")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-4-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Workaround database specifies 16011163337 as a workaround so lets move it
there.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-3-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Any rules using engine matching are currently broken due RTP processing
happening too in early init, before the list of hardware engines has been
initialised.
Fix this by moving workaround processing to later in the driver probe
sequence, to just before the processed list is used for the first time.
Looking at the debugfs gt0/workarounds on ADL-P we notice 14011060649
should be present while we see, before:
GT Workarounds
14011059788
14015795083
And with the patch:
GT Workarounds
14011060649
14011059788
14015795083
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-2-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Sync to fix conlicts between drm-xe-next and drm-intel-next.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Add support to allow retrying the sending of MMIO requests
from the VF to the GUC in the event of an error. During the
suspend/resume process, VFs begin resuming only after the PF has
resumed. Although the PF resumes, the GUC reset and provisioning
occur later in a separate worker process.
When there are a large number of VFs, some may attempt to resume
before the PF has completed its provisioning. Therefore, if a
MMIO request from a VF fails during this period, we will retry
sending the request up to GUC_RESET_VF_STATE_RETRY_MAX times,
which is set to a maximum of 10 attempts.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piorkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224102807.11065-3-satyanarayana.k.v.p@intel.com
|
|
When both PF and VF devices are enabled on the host, they
resume simultaneously during system resume.
However, the PF must finish provisioning the VF before any
VFs can successfully resume.
Establish a parent-child device link between the PF and VF
devices to ensure the correct order of resumption.
V4 -> V5:
- Added missing break in the error condition.
V3 -> V4:
- Made xe_pci_pf_get_vf_dev() as a static function and updated
input parameter types.
- Updated xe_sriov_warn() to xe_sriov_abort() when VF device
cannot be found.
V2 -> V3:
- Added function documentation for xe_pci_pf_get_vf_dev().
- Added assertion if not called from PF.
V1 -> V2:
- Added a helper function to get VF pci_dev.
- Updated xe_sriov_notice() to xe_sriov_warn() if vf pci_dev
is not found.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piorkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224102807.11065-2-satyanarayana.k.v.p@intel.com
|
|
Wa_13012615864 applies to xe3lpg
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221112200.388612-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
program_invocation_short_name may not be available in other systems.
Instead, replace it with the argv[0] to pass the executable name.
Fixes build error when program_invocation_short_name is not available:
drivers/gpu/drm/xe/xe_gen_wa_oob.c:34:3: error: use of
undeclared identifier 'program_invocation_short_name' 34 |
program_invocation_short_name); | ^ 1 error
generated.
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224-macos-build-support-xe-v3-1-d2c9ed3a27cc@samsung.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Currently we just leave it uninitialised, which at first looks harmless,
however we also don't zero out the pfn array, and with pfn_flags_mask
the idea is to be able set individual flags for a given range of pfn or
completely ignore them, outside of default_flags. So here we end up with
pfn[i] & pfn_flags_mask, and if both are uninitialised we might get back
an unexpected flags value, like asking for read only with default_flags,
but getting back write on top, leading to potentially bogus behaviour.
To fix this ensure we zero the pfn_flags_mask, such that hmm only
considers the default_flags and not also the initial pfn[i] value.
v2 (Thomas):
- Prefer proper initializer.
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250226174748.294285-2-matthew.auld@intel.com
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Add mmap support for PCI memory barrier (Tejas, Matthew Auld)
- Enable integration with perf pmu, exposing event counters: for now, just
GT C6 residency (Vinay, Lucas)
- Add "survivability mode" to allow putting the driver in a state capable of
firmware upgrade on critical failures (Riana, Rodrigo)
- Add PXP HWDRM support and enable for compatible platforms:
Meteor Lake and Lunar Lake (Daniele, John Harrison)
- Expose package and vram temperature over hwmon subsystem (Raag, Badal, Rodrigo)
Cross-subsystem Changes:
- Backmege drm-next to synchronize with i915 display and other internal APIs
Display Changes (including i915):
- Device probe re-order to help with flicker-free boot (Maarten)
- Align watermark, hpd and dsm with i915 (Rodrigo)
- Better abstraction for d3cold (Rodrigo)
Driver Changes:
- Make sure changes to ccs_mode is with helper for gt sync reset (Maciej)
- Drop mmio_ext abstraction since it didn't prove useful in its current form
(Matt Roper)
- Reject BO eviction if BO is bound to current VM (Oak, Thomas Hellström)
- Add GuC Power Conservation debugfs (Rodrigo)
- L3 cache topology updates for Xe3 (Francois, Matt Atwood)
- Better logging about missing GuC logs (John Harrison)
- Better logging for hwconfig-related data availability (John Harrison)
- Tracepoint updates for xe_bo_create, xe_vm and xe_vma (Oak)
- Add missing SPDX licenses (Francois)
- Xe suballocator imporovements (Michal Wajdeczko)
- Improve logging for native vs SR-IOV driver mode (Satyanarayana)
- Make sure VF bootstrap is not attempted in execlist mode (Maarten)
- Add GuC Buffer Cache abstraction for some CTB H2G actions and use
during VF provisioning (Michal Wajdeczko)
- Better synchronization in gtidle for new users (Vinay)
- New workarounds for Panther Lake (Nirmoy, Vinay)
- PCI ID updates for Panther Lake (Matt Atwood)
- Enable SR-IOV for Panther Lake (Michal Wajdeczko)
- Update MAINTAINERS to stop directing xe changes to drm-misc (Lucas)
- New PCI IDs for Battle Mage (Shekhar)
- Better pagefault logging (Francois)
- SR-IOV fixes and refactors for past and new platforms (Michal Wajdeczko)
- Platform descriptor refactors and updates (Sai Teja)
- Add gt stats debugfs (Francois)
- Add guc_log debugfs to dump to dmesg (Lucas)
- Abstract per-platform LMTT availability (Piotr Piórkowski)
- Refactor VRAM manager location (Piotr Piórkowski)
- Add missing xe_pm_runtime_put when forcing wedged mode (Shuicheng)
- Fix possible lockup when forcing wedged mode (Xin Wang)
- Probe refactors to use cleanup actions with better error handling (Lucas)
- XE_IOCTL_DBG clarification for userspace (Maarten)
- Better xe_mmio initialization and abstraction (Ilia)
- Drop unnecessary GT lookup (Matt Roper)
- Skip client engine usage from fdinfo for VFs (Marcin Bernatowicz)
- Allow to test xe_sync_entry_parse with error injection (Priyanka)
- OA fix for polled read (Umesh)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/m3gbuh32wgiep43i4zxbyhxqbenvtgvtao5sczivlasj7tikwv@dmlba4bfg2ny
|
|
Recent discussions with the hardware architects have revealed that
the TIMESTAMP_OVERRIDE register is never expected to hold a valid/useful
value on production hardware. That register would only get used by
hardware workarounds (although there are none that use it today) or
during early internal hardware testing.
Due to lack of documentation it's not clear exactly what the driver
should be doing if CTC_MODE[0] is set (or even whether that's a setting
that would ever be encountered on real hardware), but it's definitely
not what Xe and i915 have been doing. So drop the incorrect code trying
to use TIMESTAMP_REGISTER. If the driver does encounter CTC_MODE[0] in
the wild, we'll print a warning and just continue trying to use the
crystal clock frequency since that's probably less incorrect than what
we're doing today.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250225224908.1671554-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
xe_exec_queue_kill can sleep, so we can't call it from under a spinlock.
We can instead move the queues to a separate list and then kill them all
after we release the spinlock.
Furthermore, xe_exec_queue_kill can take the VM lock so we can't call it
while holding the PXP mutex because the mutex is taken under VM lock at
queue creation time. Note that while it is safe to call the kill without
holding the mutex, we must call it after the PXP state has been updated,
otherwise an app might be able to create a queue between the
invalidation and the state update, which would break the state machine.
Since being in the list is used to track whether RPM cleanup is needed,
we can no longer defer that to queue_destroy, so we perform it
immediately instead.
v2: also avoid calling kill() under pxp->mutex.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/intel-xe/34aaced9-4a9d-4e8c-900a-b8f73452e35c@stanley.mountain/
Fixes: f8caa80154c4 ("drm/xe/pxp: Add PXP queue tracking and session start")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250225235328.2895877-1-daniele.ceraolospurio@intel.com
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull for v6.15:
Features and functionality:
- Enable DP 128b/132b SST DSC (Jani, Imre)
- Allow DSB to perform commits when VRR is enabled (Ville)
- Compute HDMI PLLs for SNPS/C10 PHYs for rates not in fixed tables (Ankit)
- Allow DSB usage when PSR is enabled on LNL+ (Jouni)
- Enable Panel Replay mode change without full modeset (Jouni)
- Enable async flips with compressed buffers on ICL+ (Ville)
- Support luminance based brightness control via DPCD for eDP (Suraj)
- Enable VRR enable/disable without full modeset (Mitul, Ankit)
- Add debugfs facility for force testing HDCP 1.4 (Suraj)
- Add scaler tracepoints, improve plane tracepoints (Ville)
- Improve DMC wakelock debugging facilities (Gustavo)
- Allow GuC SLPC default strategies on MTL+ for performance (Rodrigo)
- Provide more information on display faults (Ville)
Refactoring and cleanups:
- Continue conversions to struct intel_display (Ville, Jani, Suraj, Imre)
- Joiner and Y plane reorganization (Ville)
- Move HDCP debugfs to intel_hdcp.c (Jani)
- Clean up and unify LSPCON interfaces (Jani)
- Move code out of intel_display.c to reduce its size (Ville)
- Clean up and simplify DDI port enabling/disabling (Imre)
- Make LPT LP a dedicated PCH type, refactor (Jani)
- Simplify DSC range BPG offset calculation (Ankit)
- Scaler cleanups (Ville)
- Remove unused code from GVT (David Alan Gilbert)
- Improve plane debugging (Ville)
- DSB and VRR refactoring (Ville)
Fixes:
- Check if vblank is sufficient for DSC prefill and scaler (Mitul)
- Fix Mesa clear color alignment regression (Ville)
- Add missing TC DP PHY lane stagger delay (Imre)
- Fix DSB + VRR usage for PTL+ (Ville)
- Improve robustness of display VT-d workarounds (Ville)
- Fix platforms for dbuf tracker state service programming (Ravi)
- Fix DMC wakelock support conditions (Gustavo)
- Amend DMC wakelock register ranges (Gustavo)
- Disable the Common Primary Timing Generator (CMTG) (Gustavo)
- Enable C20 PHY SSC (Suraj)
- Add workaround for DKL PHY DP mode write (Nemesa)
- Fix build warnings on clamp() usage (Guenter Roeck, Ankit)
- Fix error handling while adding a connector (Imre)
- Avoid full modeset at probe on vblank delay mismatches (Ville)
- Fix encoder HDMI check for HDCP line rekeying (Suraj)
- Fix HDCP repeater authentication during topology change (Suraj)
- Handle display PHY power state reset for power savings (Mika)
- Fix typos all over the place (Nitin)
- Update HDMI TMDS C20 parameters for various platforms (Dnyaneshwar)
- Guarantee a minimum hblank time for 128b/132b and 8b/10b MST (Arun, Imre)
- Do not hardcode LSPCON settle timeout (Giedrius Statkevičius)
Xe driver changes:
- Re-use display vmas when possible (Maarten)
- Remove double pageflip (Maarten)
- Enable DP tunneling (Imre)
- Separate i915 and xe tracepoints (Ville)
DRM core changes:
- Increase DPCD eDP display control CAP size to 5 bytes (Suraj)
- Add DPCD eDP version 1.5 definition (Suraj)
- Add timeout parameter to drm_lspcon_set_mode() (Giedrius Statkevičius)
Merges:
- Backmerge drm-next (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87h64j7b7n.fsf@intel.com
|
|
Add PVC workaround 22016596838 that disables EU DOP gating
during EU stall sampling.
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/062a12ed9e110fea420cd47cb70fb10136ee9132.1740533885.git.harish.chegondi@intel.com
|
|
User space can get the EU stall data record size, EU stall capabilities,
EU stall sampling rates, and per XeCore buffer size with query IOCTL
DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL.
A struct drm_xe_query_eu_stall will be returned to the user space along
with an array of supported sampling rates sorted in the fastest sampling
rate first order. sampling_rates in struct drm_xe_query_eu_stall will
point to the array of sampling rates.
Any capabilities in EU stall sampling as of this patch are considered
as base capabilities. New capability bits will be added for any new
functionality added later.
v12: Rename has_eu_stall_sampling_support() to
xe_eu_stall_supported_on_platform() and move it to header file.
v11: Check if EU stall sampling is supported on the platform.
v10: Change comments and variable names as per feedback
v9: Move reserved fields above num_sampling_rates in
struct drm_xe_query_eu_stall.
v7: Change sampling_rates from a pointer to flexible array.
v6: Include EU stall sampling rates information and
per XeCore buffer size in the query information.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/67ba42796a5a99d648239c315694cd222812a49b.1740533885.git.harish.chegondi@intel.com
|
|
Add EU stall sampling support for Xe2 architecture GPUs - LNL and BMG.
EU stall data format for LNL and BMG is different from that of PVC.
v10: Update comments as per review feedback
v9: Use GRAPHICS_VER() check instead of platform
v8: Renamed struct drm_xe_eu_stall_data_xe2 to struct xe_eu_stall_data_xe2
since it is a local structure.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d85093e9ab1204d14d2cc783f304a4bc8688951c.1740533885.git.harish.chegondi@intel.com
|
|
If the user space doesn't read the EU stall data fast enough,
it is possible that the EU stall data buffer can get filled,
and if the hardware wants to write more data, it simply drops
data due to unavailable buffer space. In that case, hardware
sets a bit in a register. If the driver detects data drop,
the driver read() returns -EIO error to let the user space
know that HW has dropped data. The -EIO error is returned
even if there is EU stall data in the buffer. A subsequent
read by the user space returns the remaining EU stall data.
v12: Move 'goto exit_drop;' to the next
'if (read_data_size == 0)' statement.
v11: Clear drop bit even for empty data buffer as the data
was read from the buffer in the previous read.
v10: Reverted the changes back to v8:
Clear the drop bits only after reading the data.
v9: Move all data drop handling code to this patch
Clear all drop data bits before returning -EIO.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6fbfd7cfa42cb3ef5515b6412573d74c7cd3d27a.1740533885.git.harish.chegondi@intel.com
|
|
Implement the EU stall sampling APIs to read() and poll() EU stall data.
A work function periodically polls the EU stall data buffer write pointer
registers to look for any new data and caches the write pointer. The read
function compares the cached read and write pointers and copies any new
data to the user space.
v11: Used gt->eu_stall->stream_lock instead of stream->buf_lock.
Removed read and write offsets from trace and added read size.
Moved workqueue from struct xe_eu_stall_data_stream to
struct xe_eu_stall_gt.
v10: Used cancel_delayed_work_sync() instead of flush_delayed_work()
Replaced per xecore lock with a lock for all the xecore buffers
Code movement and optimizations as per review feedback
v9: New patch split from the previous patch.
Used *_delayed_work functions instead of hrtimer
Addressed the review feedback in read and poll functions
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/369dee85a3b6bd2c08aeae89ca55e66a9a0242d2.1740533885.git.harish.chegondi@intel.com
|
|
Implement EU stall sampling APIs introduced in the previous patch for
Xe_HPC (PVC). Add register definitions and the code that accesses these
registers to the APIs.
Add initialization and clean up functions and their implementations,
EU stall enable and disable functions.
v11: Move stream->xecore_buf alloc to xe_eu_stall_data_buf_alloc().
Register xe_eu_stall_fini() with devm_add_action_or_reset()
instead of calling it from xe_gt_fini().
Changed a couple of variables in struct xe_eu_stall_data_stream
from unsigned int to int.
v10: Fixed error rewinding code
Moved code around as per review feedback
v9: Moved structure definitions from xe_eu_stall.h to xe_eu_stall.c
Moved read and poll implementations to the next patch
Used xe_bo_create_pin_map_at_aligned instead of xe_bo_create_pin_map
Changed lock names as per review feedback
Moved drop data handling into a subsequent patch
Moved code around as per review feedback
v8: Updated copyright year in xe_eu_stall_regs.h to 2025.
Renamed struct drm_xe_eu_stall_data_pvc to struct xe_eu_stall_data_pvc
since it is a local structure.
v6: Fix buffer wrap around over write bug (Matt Olson)
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b6aeca593d521828a0b4fbf6cfd2844716c4fc66.1740533885.git.harish.chegondi@intel.com
|
|
A new hardware feature first introduced in PVC gives capability to
periodically sample EU stall state and record counts for different stall
reasons, on a per IP basis, aggregate across all EUs in a subslice and
record the samples in a buffer in each subslice. Eventually, the aggregated
data is written out to a buffer in the memory. This feature is also
supported in XE2 and later architecture GPUs.
Use an existing IOCTL - DRM_IOCTL_XE_OBSERVATION as the interface into the
driver from the user space to do initial setup and obtain a file descriptor
for the EU stall data stream. Input parameter to the IOCTL is a struct
drm_xe_observation_param in which observation_type should be set to
DRM_XE_OBSERVATION_TYPE_EU_STALL, observation_op should be
DRM_XE_OBSERVATION_OP_STREAM_OPEN and param should point to a chain of
drm_xe_ext_set_property structures in which each structure has a pair of
property and value. The EU stall sampling input properties are defined in
drm_xe_eu_stall_property_id enum.
With the file descriptor obtained from DRM_IOCTL_XE_OBSERVATION, user space
can enable and disable EU stall sampling with the IOCTLs:
DRM_XE_OBSERVATION_IOCTL_ENABLE and DRM_XE_OBSERVATION_IOCTL_DISABLE.
User space can also call poll() to check for availability of data in the
buffer. The data can be read with read(). Finally, the file descriptor
can be closed with close().
v11: Changed a couple of variables in struct eu_stall_open_properties
from unsigned int to int.
v10: Use extension number while parsing chain of extensions.
Remove function description for static functions.
Move code around as per review feedback.
v9: Changed some u32 to unsigned int.
Moved some code around as per review feedback from v8.
v8: Used div_u64 instead of / to fix 32-bit build issue.
Changed copyright year in xe_eu_stall.c/h to 2025.
v7: Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT
to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with
OA. Renamed the corresponding internal variables.
Fixed some commit messages based on review feedback.
v6: Change the input sampling rate to GPU cycles instead of
GPU cycles multiplier.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/bb707a27975c33e4a912b9839b023acb7a1f9c90.1740533885.git.harish.chegondi@intel.com
|
|
a mask
Last enabled DSS in a DSS mask can help estimate the maximum DSSes enabled
in the DSS mask, as the enabled DSSes can be discontiguous.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/79944bb27eb4f7ce5df01f964aebbf431b3a6c61.1740533885.git.harish.chegondi@intel.com
|
|
In the case where a set of checks on xe->info.platform don't assign
a value to pointer def the pointer remains uninitialized and hence
can fail the following !def check. Fix this be ensuring pointer
def is initialized to NULL.
Fixes: 292b1a8a5054 ("drm/xe: Stop ignoring errors from xe_heci_gsc_init()")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250226160524.566074-1-colin.i.king@gmail.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Refactor Wa_18013179988, Wa_14015568240, Wa_1508761755, and
Wa_1509372804, to use the proper workaround-check implementation for
out-of-band workarounds, XE_WA(), and drop the use of the platform
based WA selection.
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-3-aradhya.bhatia@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|