Age | Commit message (Collapse) | Author |
|
It is already defined in xe_vm.h and shouldn't be duplicated.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405113844.803-1-michal.wajdeczko@intel.com
|
|
We already have dedicated header for GGTT/PPGTT definitions.
It's also cleaner to separate them from implementation macros.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405123520.847-1-michal.wajdeczko@intel.com
|
|
The flags stored in the BO grew over time without following
much a naming pattern. First of all, get rid of the _BIT suffix that was
banned from everywhere else due to the guideline in
drivers/gpu/drm/i915/i915_reg.h that xe kind of follows:
Define bits using ``REG_BIT(N)``. Do **not** add ``_BIT`` suffix to the name.
Here the flags aren't for a register, but it's good practice to keep it
consistent.
Second divergence on names is the use or not of "CREATE". This is
because most of the flags are passed to xe_bo_create*() family of
functions, changing its behavior. However, since the flags are also
stored in the bo itself and checked elsewhere in the code, it seems
better to just omit the CREATE part.
With those 2 guidelines, all the flags are given the form
XE_BO_FLAG_<FLAG_NAME> with the following commands:
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i \
-e "s/XE_BO_\([_A-Z0-9]*\)_BIT/XE_BO_\1/g" \
-e 's/XE_BO_CREATE_/XE_BO_FLAG_/g'
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i -r \
-e 's/XE_BO_(DEFER_BACKING|SCANOUT|FIXED_PLACEMENT|PAGETABLE|NEEDS_CPU_ACCESS|NEEDS_UC|INTERNAL_TEST|INTERNAL_64K|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g'
And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to
follow the coding style.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Add XE_BO_GGTT_INVALIDATE flag which indicates the GGTT should be
invalidated when a BO is added / removed from the GGTT. This is
typically set when a BO is used by the GuC as the GuC has GGTT TLBs.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
[mlankhorst: Small fix to only inherit GGTT_INVALIDATE from src bo]
[mlankhorst: Remove _BIT from name]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306052002.311196-4-matthew.brost@intel.com
|
|
While today we are getting VRAM allocations aligned to 64K as the
XE_VRAM_FLAGS_NEED64K flag could be set, we shouldn't only rely on
that flag and we should also allow caller to specify required 64K
alignment explicitly. Define new XE_BO_NEEDS_64K flag for that.
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240313104132.1045-2-michal.wajdeczko@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
GuC load will need to happen at an earlier point in probe, where local
memory is not yet available. Use system memory for GuC data structures
used for initial "hwconfig" load, and realloc at a later,
"post-hwconfig" load if needed, when local memory is available.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com
|
|
Enhanced xe_bo_move trace to be more readable.
It will help to show the migration details.
Src and dst details.
v2: Modify trace_xe_bo_move(), it takes the integer mem_type
rather than a string.
Make mem_type_to_name() extern, it will be used by trace.(Thomas)
v3: Move mem_type_to_name() to xe_bo.[ch] (Thomas, Matt)
v4: Add device details to reduce ambiquity related to vram0/vram1. (Oak)
v5: Rename mem_type_to_name to xe_mem_type_to_name. (Thomas)
v6: Optimised code to use xe_bo_device(__entry->bo). (Thomas)
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Kempczynski Zbigniew <Zbigniew.Kempczynski@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240220044748.948496-1-priyanka.dandamudi@intel.com
|
|
Place the function pointers in rodata. Also drop the extra declaration
while at it.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Release all mmap mappings for all vram objects which are associated
with userfault such that, while pcie function in D3hot, any access
to memory mappings will raise a userfault.
Upon userfault, in order to access memory mappings, if graphics
function is in D3 then runtime resume of dgpu will be triggered to
transition to D0.
v2:
- Avoid iomem check before bo migration check as bo can migrate
to system memory (Matthew Auld)
v3:
- Delete bo userfault link during bo destroy
- Upon bo move (vram-smem), do bo userfault link deletion in
xe_bo_move_notify instead of xe_bo_move (Thomas Hellström)
- Grab lock in rpm hook while deleting bo userfault link (Matthew Auld)
v4:
- Add kernel doc and wrap vram_userfault related
stuff in the structure (Matthew Auld)
- Get rpm wakeref before taking dma reserve lock (Matthew Auld)
- In suspend path apply lock for entire list op
including list iteration (Matthew Auld)
v5:
- Use mutex lock instead of spin lock
v6:
- Fix review comments (Matthew Auld)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #For the xe_bo_move_notify() changes
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20240104130702.950078-1-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When we map BO in GGTT, then by default we are using PAT index
that corresponds to XE_CACHE_WB but ppcoming feature will require
use of the PAT index of the XE_CACHE_UC. Define new BO flag that
could be used during BO creation to force alternate caching.
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-7-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
A helper for managed BO allocations makes it possible to remove specific
"fini" actions and will simplify the following patches adding ability to
execute a release action for specific BO directly.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Allow userspace to specify the CPU caching mode at object creation.
Modify gem create handler and introduce xe_bo_create_user to replace
xe_bo_create. In a later patch we will support setting the pat_index as
part of vm_bind, where expectation is that the coherency mode extracted
from the pat_index must be least 1way coherent if using cpu_caching=wb.
v2
- s/smem_caching/smem_cpu_caching/ and
s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
- Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
just cares that zeroing/swap-in can't be bypassed with the given
smem_caching mode. (Matt Roper)
- Fix broken range check for coh_mode and smem_cpu_caching and also
don't use constant value, but the already defined macros. (José)
- Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
- Add note in kernel-doc for dgpu and coherency modes for system
memory. (José)
v3 (José):
- Make sure to reject coh_mode == 0 for VRAM-only.
- Also make sure to actually pass along the (start, end) for
__xe_bo_create_locked.
v4
- Drop UC caching mode. Can be added back if we need it. (Matt Roper)
- s/smem_cpu_caching/cpu_caching. Idea is that VRAM is always WC, but
that is currently implicit and KMD controlled. Make it explicit in
the uapi with the limitation that it currently must be WC. For VRAM
+ SYS objects userspace must now select WC. (José)
- Make sure to initialize bo_flags. (José)
v5
- Make to align with the other uapi and prefix uapi constants with
DRM_ (José)
v6:
- Make it clear that zero cpu_caching is only allowed for kernel
objects. (José)
v7: (Oak)
- With all the changes from the original design, it looks we can
further simplify here and drop the explicit coh_mode. We can just
infer the coh_mode from the cpu_caching. i.e reject cpu_caching=wb +
coh_none. It's one less thing for userspace to maintain so seems
worth it.
v8:
- Make sure to also update the kselftests.
Testcase: igt@xe_mmap@cpu-caching
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
Cc: Zhengguo Xu <zhengguo.xu@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Zhengguo Xu <zhengguo.xu@intel.com>
Acked-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Using "get" typically refers to obtaining a refcount, which we don't do
here so rename to xe_bo_sg().
Suggested-by: Ohad Sharabi <osharabi@habana.ai>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/946
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Ohad Sharabi<osharabi@habana.ai>
Link: https://patchwork.freedesktop.org/patch/msgid/20231122110359.4087-3-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
We don't need i915_gem_object_flush_if_display on Xe side. Add empty
define to tackle compilation errors with display code where it's used.
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Unused and would like to remove the memtype_to_tile() which it calls.
Signed-off-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Bit 7 in the leaf node is normally programmed with pat[2], however with
2M/1G pages that same bit in the PDE/PDPE also toggles 2M/1G pages. For
2M/1G entries the pat[2] is rather moved to bit 12, which is now free
given that the address must be aligned to 2M or 1G, leaving bit 7 for
toggling 2M/1G pages.
Bspec: 59510, 45038
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Xe2 adds one more bit to cover all the possible 32 entries. Although
those entries are not used by internal kernel code paths, it's expected
that userspace will make use of it.
Bspec: 59510, 67095
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231006182325.3617685-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Change the xelp_pte_encode() and xelp_pde_encode() functions to use the
platform-dependent pat_index. The same function can be used for all
platforms as they only need to encode the pat_index bits in the same
pte/pde layout. For platforms that don't have the most significant bit,
as long as they don't return a bogus index they should be fine.
v2: Use the same logic to encode pde as it's compatible with previous
logic, it's more future proof and also fixes the cache setting for
PVC (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-10-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Use the newly added drm_print_memory_stats helper to show memory
utilisation of our objects in drm/driver specific fdinfo output.
To collect the stats we walk the per memory regions object lists
and accumulate object size into the respective drm_memory_stats
categories.
Objects with multiple possible placements are reported in multiple
regions for total and shared sizes, while other categories are
counted only for the currently active region.
V4:
- Remove rcu lock - Auld/Thomas
- take refcnt only if its non-zero - Auld
- DMA_RESV_USAGE_BOOKKEEP covers all fences - Auld
- covert to xe_bo for public objects
V3:
- dont use xe_bo_get/put, not needed
- use designated initializer - Jani
- use list_for_each_entry_rcu
- Fix Checkpatch err - CI
V2:
- Use static initializer for mem_type - Himal/Jani
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Apart from asserts, it's essentially the same as
xe_bo_lock()/xe_bo_unlock(), and the usage intentions of this interface
was unclear. Remove it.
v2:
- Update the xe_display subsystem as well.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-4-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
xe_bo_lock() was, although it only grabbed a single lock, unnecessarily
using ttm_eu_reserve_buffers(). Simplify and document the interface.
v2:
- Update also the xe_display subsystem.
v4:
- Reinstate a lost dma_resv_reserve_fences().
- Improve on xe_bo_lock() documentation (Matthew Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-2-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls
WARN() instead of BUG(). BUG() crashes the kernel and should only be
used when it is absolutely unavoidable in case of catastrophic and
unrecoverable failures, which is not the case here.
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Integrated graphics 1270 and beyond should set the PTE_LM bit in the PTE
when it's stolen memory. Add a new function, xe_bo_is_stolen_devmem(),
and use it when encoding the PTE.
In some places in the spec the PTE bit is called "Local Memory",
abbreviated as LM, and in others it's called "Device Memory" (DM). Since
we moved away from "Local Memory" and preferred the "vram" terminology,
also rename the macros as DM to follow the name of the new function.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230726160708.3967790-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The output arg is_vram in xe_bo_addr() is unused by several callers.
It's also not what the function is mainly doing. Remove the argument and
let the interested callers to call xe_bo_is_vram().
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230726160708.3967790-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add the new flag XE_BO_NEEDS_CPU_ACCESS, to force allocating in the
mappable part of vram. If no flag is specified we do a topdown
allocation, to limit the chances of stealing the precious mappable part,
if we don't need it. If this is a full-bar system, then this all gets
nooped.
For kernel users, it looks like xe_bo_create_pin_map() is the central
place which users should call if they want CPU access to the object, so
add the flag there.
We still need to plumb this through for userspace allocations. Also it
looks like page-tables are using pin_map(), which is less than ideal. If
we can already use the GPU to do page-table management, then maybe we
should just force that for small-bar.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On MTL and beyond, the GPU performs non-coherent accesses to the PPGTT
page tables. These page tables should be mapped as CPU:WC.
Removes CAT errors triggered by xe_exec_basic@once-basic on MTL:
xe 0000:00:02.0: [drm:__xe_pt_bind_vma [xe]] Preparing bind, with range [1a0000...1a0fff) engine 0000000000000000.
xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 1 entries to update
xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 0: Update level 3 at (0 + 1) [0...8000000000) f:0
xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2
xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2
xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4
v2:
- Rename to XE_BO_PAGETABLE to make it more clear that this BO is the
pagetable itself, rather than just being bound in the PPGTT. (Lucas)
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230725003433.1992137-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
It looks like bulk_move is set during object construction, but is only
removed on object close, however in various places we might not yet have
an actual fd to close, like on the error paths for the gem_create ioctl,
and also one internal user for the evict_test_run_gt() selftest. Try to
handle those cases by manually resetting the bulk_move. This should
prevent triggering:
WARNING: CPU: 7 PID: 8252 at drivers/gpu/drm/ttm/ttm_bo.c:327
ttm_bo_release+0x25e/0x2a0 [ttm]
v2 (Nirmoy):
- It should be safe to just unconditionally call
__xe_bo_unset_bulk_move() in most places.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Rather than open coding VM binds and VMA tracking, use the GPUVA
library. GPUVA provides a common infrastructure for VM binds to use mmap
/ munmap semantics and support for VK sparse bindings.
The concepts are:
1) xe_vm inherits from drm_gpuva_manager
2) xe_vma inherits from drm_gpuva
3) xe_vma_op inherits from drm_gpuva_op
4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the
GPUVA code to generate an VMA operations list which is parsed, committed,
and executed.
v2 (CI): Add break after default in case statement.
v3: Rebase
v4: Fix some error handling
v5: Use unlocked version VMA in error paths
v6: Rebase, address some review feedback mainly Thomas H
v7: Fix compile error in xe_vma_op_unwind, address checkpatch
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
LRU position on every exec.
v2: Bulk move for compute VMs, use WARN rather than BUG
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add uAPI and implementation for NULL bindings. A NULL binding is defined
as writes dropped and read zero. A single bit in the uAPI has been added
which results in a single bit in the PTEs being set.
NULL bindings are intendedd to be used to implement VK sparse bindings,
in particular residencyNonResidentStrict property.
v2: Fix BUG_ON shown in VK testing, fix check patch warning, fix
xe_pt_scan_64K, update __gen8_pte_encode to understand NULL bindings,
remove else if vma_addr
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Suggested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
XE_PTE_FLAG_READ_ONLY is specific to struct xe_vma, move it from xe_bo.h
to xe_vm_types.h to reflect that.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This define is for internal PTE flags rather than fields in the hardware
PTEs, rename as such. This will help in an upcoming patch to avoid
further confusion.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Since memory and address spaces are a tile concept rather than a GT
concept, we need to plumb tile-based handling through lots of
memory-related code.
Note that one remaining shortcoming here that will need to be addressed
before media GT support can be re-enabled is that although the address
space is shared between a tile's GTs, each GT caches the PTEs
independently in their own TLB and thus TLB invalidation should be
handled at the GT level.
v2:
- Fix kunit test build.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On platforms with VRAM, the VRAM is associated with the tile, not the
GT.
v2:
- Unsquash the GGTT handling back into its own patch.
- Fix kunit test build
v3:
- Tweak the "FIXME" comment to clarify that this function will be
completely gone by the end of the series. (Lucas)
v4:
- Move a few changes that were supposed to be part of the GGTT patch
back to that commit. (Gustavo)
v5:
- Kerneldoc parameter name fix.
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-11-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The _io_offset helper function is returning an offset into the GPU
address space. Using the CPU address offset (io_) is not correct.
Rename to reflect usage.
Update to use GPU offset information.
Update PT dma_offset to use the helper
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When creating page tables from xe_exec_ioctl, we may end up freeing
memory we just validated. To be certain this does not happen, do not
allow the current reservation to be evicted from the ioctl.
Callchain:
[ 109.008522] xe_bo_move_notify+0x5c/0xf0 [xe]
[ 109.008548] xe_bo_move+0x90/0x510 [xe]
[ 109.008573] ttm_bo_handle_move_mem+0xb7/0x170 [ttm]
[ 109.008581] ttm_bo_swapout+0x15e/0x360 [ttm]
[ 109.008586] ttm_device_swapout+0xc2/0x110 [ttm]
[ 109.008592] ttm_global_swapout+0x47/0xc0 [ttm]
[ 109.008598] ttm_tt_populate+0x7a/0x130 [ttm]
[ 109.008603] ttm_bo_handle_move_mem+0x160/0x170 [ttm]
[ 109.008609] ttm_bo_validate+0xe5/0x1d0 [ttm]
[ 109.008614] ttm_bo_init_reserved+0xac/0x190 [ttm]
[ 109.008620] __xe_bo_create_locked+0x153/0x260 [xe]
[ 109.008645] xe_bo_create_locked_range+0x77/0x360 [xe]
[ 109.008671] xe_bo_create_pin_map_at+0x33/0x1f0 [xe]
[ 109.008695] xe_bo_create_pin_map+0x11/0x20 [xe]
[ 109.008721] xe_pt_create+0x69/0xf0 [xe]
[ 109.008749] xe_pt_stage_bind_entry+0x208/0x430 [xe]
[ 109.008776] xe_pt_walk_range+0xe9/0x2a0 [xe]
[ 109.008802] xe_pt_walk_range+0x223/0x2a0 [xe]
[ 109.008828] xe_pt_walk_range+0x223/0x2a0 [xe]
[ 109.008853] __xe_pt_bind_vma+0x28d/0xbd0 [xe]
[ 109.008878] xe_vm_bind_vma+0xc7/0x2f0 [xe]
[ 109.008904] xe_vm_rebind+0x72/0x160 [xe]
[ 109.008930] xe_exec_ioctl+0x22b/0xa70 [xe]
[ 109.008955] drm_ioctl_kernel+0xb9/0x150 [drm]
[ 109.008972] drm_ioctl+0x210/0x430 [drm]
[ 109.008988] __x64_sys_ioctl+0x85/0xb0
[ 109.008990] do_syscall_64+0x38/0x90
[ 109.008991] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Original warning:
[ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe]
...
[ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe]
[ 5613.316098] Call Trace:
[ 5613.318595] <TASK>
[ 5613.320743] xe_exec_ioctl+0x383/0x8a0 [xe]
[ 5613.325278] ? __is_insn_slot_addr+0x8e/0x110
[ 5613.329719] ? __is_insn_slot_addr+0x8e/0x110
[ 5613.334116] ? kernel_text_address+0x75/0xf0
[ 5613.338429] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 5613.343778] ? __kernel_text_address+0x9/0x40
[ 5613.348181] ? unwind_get_return_address+0x1a/0x30
[ 5613.353013] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 5613.358362] ? arch_stack_walk+0x99/0xf0
[ 5613.362329] ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.366996] ? lock_acquire+0x287/0x2f0
[ 5613.370873] ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.375530] ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.380181] ? lock_release+0x225/0x2e0
[ 5613.384059] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
[ 5613.389092] drm_ioctl_kernel+0xc0/0x170
[ 5613.393068] drm_ioctl+0x1b7/0x490
[ 5613.396519] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
[ 5613.401547] ? lock_release+0x225/0x2e0
[ 5613.405432] __x64_sys_ioctl+0x8a/0xb0
[ 5613.409232] do_syscall_64+0x37/0x90
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/239
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The iommu_dma_map_sg() function ensures iova allocation doesn't
cross dma segment boundary. It does so by padding some sg elements.
This can cause overflow, ending up with sg->length being set to 0.
Avoid this by halving the maximum segment size (rounded down to
PAGE_SIZE).
Specify maximum segment size for sg elements by using
sg_alloc_table_from_pages_segment() to allocate sg_table.
v2: Use correct max segment size in dma_set_max_seg_size() call
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Allow xe_bo_addr without lock to print debug information, such
as from xe_analyze_vm.
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Rename the following defines to lose the GEN* prefixes since they don't
make sense for xe:
GEN8_PTE_SHIFT -> XE_PTE_SHIFT
GEN8_PAGE_SIZE -> XE_PAGE_SIZE
GEN8_PTE_MASK -> XE_PTE_MASK
GEN8_PDE_SHIFT -> XE_PDE_SHIFT
GEN8_PDES -> XE_PDES
GEN8_PDE_MASK -> XE_PDE_MASK
GEN8_64K_PTE_SHIFT -> XE_64K_PTE_SHIFT
GEN8_64K_PAGE_SIZE -> XE_64K_PAGE_SIZE
GEN8_64K_PTE_MASK -> XE_64K_PTE_MASK
GEN8_64K_PDE_MASK -> XE_64K_PDE_MASK
GEN8_PDE_PS_2M -> XE_PDE_PS_2M
GEN8_PDPE_PS_1G -> XE_PDPE_PS_1G
GEN8_PDE_IPS_64K -> XE_PDE_IPS_64K
GEN12_GGTT_PTE_LM -> XE_GGTT_PTE_LM
GEN12_USM_PPGTT_PTE_AE -> XE_USM_PPGTT_PTE_AE
GEN12_PPGTT_PTE_LM -> XE_PPGTT_PTE_LM
GEN12_PDE_64K -> XE_PDE_64K
GEN12_PTE_PS64 -> XE_PTE_PS64
GEN8_PAGE_PRESENT -> XE_PAGE_PRESENT
GEN8_PAGE_RW -> XE_PAGE_RW
PTE_READ_ONLY -> XE_PTE_READ_ONLY
Keep an XE_ prefix to make sure we don't mix the defines for the CPU
(e.g. PAGE_SIZE) with the ones fro the GPU).
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This stopped working now that TTM treats moving a pinned object through
ttm_bo_validate() as an error, for the general case. Add some new
routines to handle the new special casing needed for suspend-resume.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/244
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
In xe_migrate functions, use proper vram io offset of the
tiles while calculating addresses.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
So we don't have to keep repeating VRAM0 | VRAM1. Also if there are ever
more instances, then we have less places to update.
Suggested-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Rather than using the passed in GT, use the BO's GT determine dma_offset
when programming PTEs as these two GT's could differ (i.e. mapping a BO
from a remote GT). The BO's GT is correct GT to use as this where BO
resides, while the passed in GT is where the mapping is created.
v2:
(Thomas) - Kernel doc, extra new line
(CI) - Rebase to tip
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This seems to be the preferred nomenclature in xe. Currently we are
intermixing vram and lmem, which is confusing.
v2 (Gwan-gyeong Mun & Lucas):
- Rather apply to the entire driver
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The bo_create ioctl relied on the internal ordering of memory regions to
be the same, make sure we don't allocate stolen instead of VRAM0.
Also remove a debug warning left in.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Philippe Lecluse <philippe.lecluse1@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This adds support for stolen memory, with the same allocator as
vram_mgr. This allows us to skip a whole lot of copy-paste,
by re-using parts of xe_ttm_vram_mgr.
The stolen memory may be bound using VM_BIND, so it performs like any
other memory region.
We should be able to map a stolen BO directly using the physical memory
location instead of through GGTT even on old platforms, but I don't know
what the effects are on coherency.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Xe, is a new driver for Intel GPUs that supports both integrated and
discrete platforms starting with Tiger Lake (first Intel Xe Architecture).
The code is at a stage where it is already functional and has experimental
support for multiple platforms starting from Tiger Lake, with initial
support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
drivers), as well as in NEO (for OpenCL and Level0).
The new Xe driver leverages a lot from i915.
As for display, the intent is to share the display code with the i915
driver so that there is maximum reuse there. But it is not added
in this patch.
This initial work is a collaboration of many people and unfortunately
the big squashed patch won't fully honor the proper credits. But let's
get some git quick stats so we can at least try to preserve some of the
credits:
Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Co-developed-by: Francois Dugast <francois.dugast@intel.com>
Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Co-developed-by: Jani Nikula <jani.nikula@intel.com>
Co-developed-by: José Roberto de Souza <jose.souza@intel.com>
Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Co-developed-by: Dave Airlie <airlied@redhat.com>
Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
|