linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-04-05	drm/xe: Drop xe_vm_assert_held() macro definition from xe_bo.h	Michal Wajdeczko
	It is already defined in xe_vm.h and shouldn't be duplicated. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405113844.803-1-michal.wajdeczko@intel.com
2024-04-05	drm/xe: Move PTE/PDE bit definitions to proper header	Michal Wajdeczko
	We already have dedicated header for GGTT/PPGTT definitions. It's also cleaner to separate them from implementation macros. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405123520.847-1-michal.wajdeczko@intel.com
2024-04-02	drm/xe: Normalize bo flags macros	Lucas De Marchi
	The flags stored in the BO grew over time without following much a naming pattern. First of all, get rid of the _BIT suffix that was banned from everywhere else due to the guideline in drivers/gpu/drm/i915/i915_reg.h that xe kind of follows: Define bits using ``REG_BIT(N)``. Do not add ``_BIT`` suffix to the name. Here the flags aren't for a register, but it's good practice to keep it consistent. Second divergence on names is the use or not of "CREATE". This is because most of the flags are passed to xe_bo_create() family of functions, changing its behavior. However, since the flags are also stored in the bo itself and checked elsewhere in the code, it seems better to just omit the CREATE part. With those 2 guidelines, all the flags are given the form XE_BO_FLAG_<FLAG_NAME> with the following commands: git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i \ -e "s/XE_BO_\([_A-Z0-9]\)_BIT/XE_BO_\1/g" \ -e 's/XE_BO_CREATE_/XE_BO_FLAG_/g' git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i -r \ -e 's/XE_BO_(DEFER_BACKING\|SCANOUT\|FIXED_PLACEMENT\|PAGETABLE\|NEEDS_CPU_ACCESS\|NEEDS_UC\|INTERNAL_TEST\|INTERNAL_64K\|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g' And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to follow the coding style. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-20	drm/xe: Add XE_BO_GGTT_INVALIDATE flag	Matthew Brost
	Add XE_BO_GGTT_INVALIDATE flag which indicates the GGTT should be invalidated when a BO is added / removed from the GGTT. This is typically set when a BO is used by the GuC as the GuC has GGTT TLBs. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> [mlankhorst: Small fix to only inherit GGTT_INVALIDATE from src bo] [mlankhorst: Remove _BIT from name] Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240306052002.311196-4-matthew.brost@intel.com
2024-03-15	drm/xe: Allow VRAM BO allocations aligned to 64K	Michal Wajdeczko
	While today we are getting VRAM allocations aligned to 64K as the XE_VRAM_FLAGS_NEED64K flag could be set, we shouldn't only rely on that flag and we should also allow caller to specify required 64K alignment explicitly. Define new XE_BO_NEEDS_64K flag for that. Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240313104132.1045-2-michal.wajdeczko@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2024-02-20	drm/xe/guc: Allocate GuC data structures in system memory for initial load	Michał Winiarski
	GuC load will need to happen at an earlier point in probe, where local memory is not yet available. Use system memory for GuC data structures used for initial "hwconfig" load, and realloc at a later, "post-hwconfig" load if needed, when local memory is available. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com
2024-02-20	drm/xe/xe_bo_move: Enhance xe_bo_move trace	Priyanka Dandamudi
	Enhanced xe_bo_move trace to be more readable. It will help to show the migration details. Src and dst details. v2: Modify trace_xe_bo_move(), it takes the integer mem_type rather than a string. Make mem_type_to_name() extern, it will be used by trace.(Thomas) v3: Move mem_type_to_name() to xe_bo.[ch] (Thomas, Matt) v4: Add device details to reduce ambiquity related to vram0/vram1. (Oak) v5: Rename mem_type_to_name to xe_mem_type_to_name. (Thomas) v6: Optimised code to use xe_bo_device(__entry->bo). (Thomas) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Oak Zeng <oak.zeng@intel.com> Cc: Kempczynski Zbigniew <Zbigniew.Kempczynski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Brian Welty <brian.welty@intel.com> Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240220044748.948496-1-priyanka.dandamudi@intel.com
2024-01-19	drm/xe: make xe_ttm_funcs const	Jani Nikula
	Place the function pointers in rodata. Also drop the extra declaration while at it. Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-01-08	drm/xe/dgfx: Release mmap mappings on rpm suspend	Badal Nilawar
	Release all mmap mappings for all vram objects which are associated with userfault such that, while pcie function in D3hot, any access to memory mappings will raise a userfault. Upon userfault, in order to access memory mappings, if graphics function is in D3 then runtime resume of dgpu will be triggered to transition to D0. v2: - Avoid iomem check before bo migration check as bo can migrate to system memory (Matthew Auld) v3: - Delete bo userfault link during bo destroy - Upon bo move (vram-smem), do bo userfault link deletion in xe_bo_move_notify instead of xe_bo_move (Thomas Hellström) - Grab lock in rpm hook while deleting bo userfault link (Matthew Auld) v4: - Add kernel doc and wrap vram_userfault related stuff in the structure (Matthew Auld) - Get rpm wakeref before taking dma reserve lock (Matthew Auld) - In suspend path apply lock for entire list op including list iteration (Matthew Auld) v5: - Use mutex lock instead of spin lock v6: - Fix review comments (Matthew Auld) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #For the xe_bo_move_notify() changes Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20240104130702.950078-1-badal.nilawar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Add XE_BO_NEEDS_UC flag to force UC mode instead WB	Michal Wajdeczko
	When we map BO in GGTT, then by default we are using PAT index that corresponds to XE_CACHE_WB but ppcoming feature will require use of the PAT index of the XE_CACHE_UC. Define new BO flag that could be used during BO creation to force alternate caching. Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231214185955.1791-7-michal.wajdeczko@intel.com Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2023-12-21	drm/xe: Add a helper for DRM device-lifetime BO create	Michał Winiarski
	A helper for managed BO allocations makes it possible to remove specific "fini" actions and will simplify the following patches adding ability to execute a release action for specific BO directly. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/uapi: Add support for CPU caching mode	Pallavi Mishra
	Allow userspace to specify the CPU caching mode at object creation. Modify gem create handler and introduce xe_bo_create_user to replace xe_bo_create. In a later patch we will support setting the pat_index as part of vm_bind, where expectation is that the coherency mode extracted from the pat_index must be least 1way coherent if using cpu_caching=wb. v2 - s/smem_caching/smem_cpu_caching/ and s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper) - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly just cares that zeroing/swap-in can't be bypassed with the given smem_caching mode. (Matt Roper) - Fix broken range check for coh_mode and smem_cpu_caching and also don't use constant value, but the already defined macros. (José) - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José) - Add note in kernel-doc for dgpu and coherency modes for system memory. (José) v3 (José): - Make sure to reject coh_mode == 0 for VRAM-only. - Also make sure to actually pass along the (start, end) for __xe_bo_create_locked. v4 - Drop UC caching mode. Can be added back if we need it. (Matt Roper) - s/smem_cpu_caching/cpu_caching. Idea is that VRAM is always WC, but that is currently implicit and KMD controlled. Make it explicit in the uapi with the limitation that it currently must be WC. For VRAM + SYS objects userspace must now select WC. (José) - Make sure to initialize bo_flags. (José) v5 - Make to align with the other uapi and prefix uapi constants with DRM_ (José) v6: - Make it clear that zero cpu_caching is only allowed for kernel objects. (José) v7: (Oak) - With all the changes from the original design, it looks we can further simplify here and drop the explicit coh_mode. We can just infer the coh_mode from the cpu_caching. i.e reject cpu_caching=wb + coh_none. It's one less thing for userspace to maintain so seems worth it. v8: - Make sure to also update the kselftests. Testcase: igt@xe_mmap@cpu-caching Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Filip Hazubski <filip.hazubski@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: Effie Yu <effie.yu@intel.com> Cc: Zhengguo Xu <zhengguo.xu@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Oak Zeng <oak.zeng@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Zhengguo Xu <zhengguo.xu@intel.com> Acked-by: Bartosz Dunajski <bartosz.dunajski@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/bo: Rename xe_bo_get_sg() to xe_bo_sg()	Thomas Hellström
	Using "get" typically refers to obtaining a refcount, which we don't do here so rename to xe_bo_sg(). Suggested-by: Ohad Sharabi <osharabi@habana.ai> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/946 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Ohad Sharabi<osharabi@habana.ai> Link: https://patchwork.freedesktop.org/patch/msgid/20231122110359.4087-3-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/display: Add empty def for i915_gem_object_flush_if_display	Jouni Högander
	We don't need i915_gem_object_flush_if_display on Xe side. Add empty define to tackle compilation errors with display code where it's used. Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Remove unused xe_bo_to_tile	Brian Welty
	Unused and would like to remove the memtype_to_tile() which it calls. Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: fix pat[2] programming with 2M/1G pages	Matthew Auld
	Bit 7 in the leaf node is normally programmed with pat[2], however with 2M/1G pages that same bit in the PDE/PDPE also toggles 2M/1G pages. For 2M/1G entries the pat[2] is rather moved to bit 12, which is now free given that the address must be aligned to 2M or 1G, leaving bit 7 for toggling 2M/1G pages. Bspec: 59510, 45038 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/xe2: Add one more bit to encode PAT to ppgtt entries	Lucas De Marchi
	Xe2 adds one more bit to cover all the possible 32 entries. Although those entries are not used by internal kernel code paths, it's expected that userspace will make use of it. Bspec: 59510, 67095 Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231006182325.3617685-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Use pat_index to encode pde/pte	Lucas De Marchi
	Change the xelp_pte_encode() and xelp_pde_encode() functions to use the platform-dependent pat_index. The same function can be used for all platforms as they only need to encode the pat_index bits in the same pte/pde layout. For platforms that don't have the most significant bit, as long as they don't return a bogus index they should be fine. v2: Use the same logic to encode pde as it's compatible with previous logic, it's more future proof and also fixes the cache setting for PVC (Matt Roper) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-10-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Implement fdinfo memory stats printing	Tejas Upadhyay
	Use the newly added drm_print_memory_stats helper to show memory utilisation of our objects in drm/driver specific fdinfo output. To collect the stats we walk the per memory regions object lists and accumulate object size into the respective drm_memory_stats categories. Objects with multiple possible placements are reported in multiple regions for total and shared sizes, while other categories are counted only for the currently active region. V4: - Remove rcu lock - Auld/Thomas - take refcnt only if its non-zero - Auld - DMA_RESV_USAGE_BOOKKEEP covers all fences - Auld - covert to xe_bo for public objects V3: - dont use xe_bo_get/put, not needed - use designated initializer - Jani - use list_for_each_entry_rcu - Fix Checkpatch err - CI V2: - Use static initializer for mem_type - Himal/Jani Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/bo: Remove the lock_no_vm()/unlock_no_vm() interface	Thomas Hellström
	Apart from asserts, it's essentially the same as xe_bo_lock()/xe_bo_unlock(), and the usage intentions of this interface was unclear. Remove it. v2: - Update the xe_display subsystem as well. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-4-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/bo: Simplify xe_bo_lock()	Thomas Hellström
	xe_bo_lock() was, although it only grabbed a single lock, unnecessarily using ttm_eu_reserve_buffers(). Simplify and document the interface. v2: - Update also the xe_display subsystem. v4: - Reinstate a lost dma_resv_reserve_fences(). - Improve on xe_bo_lock() documentation (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-2-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel	Francois Dugast
	Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Set PTE_DM bit for stolen on MTL	Lucas De Marchi
	Integrated graphics 1270 and beyond should set the PTE_LM bit in the PTE when it's stolen memory. Add a new function, xe_bo_is_stolen_devmem(), and use it when encoding the PTE. In some places in the spec the PTE bit is called "Local Memory", abbreviated as LM, and in others it's called "Device Memory" (DM). Since we moved away from "Local Memory" and preferred the "vram" terminology, also rename the macros as DM to follow the name of the new function. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Decouple vram check from xe_bo_addr()	Lucas De Marchi
	The output arg is_vram in xe_bo_addr() is unused by several callers. It's also not what the function is mainly doing. Remove the argument and let the interested callers to call xe_bo_is_vram(). Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/bo: support tiered vram allocation for small-bar	Matthew Auld
	Add the new flag XE_BO_NEEDS_CPU_ACCESS, to force allocating in the mappable part of vram. If no flag is specified we do a topdown allocation, to limit the chances of stealing the precious mappable part, if we don't need it. If this is a full-bar system, then this all gets nooped. For kernel users, it looks like xe_bo_create_pin_map() is the central place which users should call if they want CPU access to the object, so add the flag there. We still need to plumb this through for userspace allocations. Also it looks like page-tables are using pin_map(), which is less than ideal. If we can already use the GPU to do page-table management, then maybe we should just force that for small-bar. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe/mtl: Map PPGTT as CPU:WC	Matt Roper
	On MTL and beyond, the GPU performs non-coherent accesses to the PPGTT page tables. These page tables should be mapped as CPU:WC. Removes CAT errors triggered by xe_exec_basic@once-basic on MTL: xe 0000:00:02.0: [drm:__xe_pt_bind_vma [xe]] Preparing bind, with range [1a0000...1a0fff) engine 0000000000000000. xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 1 entries to update xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 0: Update level 3 at (0 + 1) [0...8000000000) f:0 xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4 v2: - Rename to XE_BO_PAGETABLE to make it more clear that this BO is the pagetable itself, rather than just being bound in the PPGTT. (Lucas) Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://lore.kernel.org/r/20230725003433.1992137-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: add missing bulk_move reset	Matthew Auld
	It looks like bulk_move is set during object construction, but is only removed on object close, however in various places we might not yet have an actual fd to close, like on the error paths for the gem_create ioctl, and also one internal user for the evict_test_run_gt() selftest. Try to handle those cases by manually resetting the bulk_move. This should prevent triggering: WARNING: CPU: 7 PID: 8252 at drivers/gpu/drm/ttm/ttm_bo.c:327 ttm_bo_release+0x25e/0x2a0 [ttm] v2 (Nirmoy): - It should be safe to just unconditionally call __xe_bo_unset_bulk_move() in most places. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Port Xe to GPUVA	Matthew Brost
	Rather than open coding VM binds and VMA tracking, use the GPUVA library. GPUVA provides a common infrastructure for VM binds to use mmap / munmap semantics and support for VK sparse bindings. The concepts are: 1) xe_vm inherits from drm_gpuva_manager 2) xe_vma inherits from drm_gpuva 3) xe_vma_op inherits from drm_gpuva_op 4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the GPUVA code to generate an VMA operations list which is parsed, committed, and executed. v2 (CI): Add break after default in case statement. v3: Rebase v4: Fix some error handling v5: Use unlocked version VMA in error paths v6: Rebase, address some review feedback mainly Thomas H v7: Fix compile error in xe_vma_op_unwind, address checkpatch Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: VM LRU bulk move	Matthew Brost
	Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves LRU position on every exec. v2: Bulk move for compute VMs, use WARN rather than BUG Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: NULL binding implementation	Matthew Brost
	Add uAPI and implementation for NULL bindings. A NULL binding is defined as writes dropped and read zero. A single bit in the uAPI has been added which results in a single bit in the PTEs being set. NULL bindings are intendedd to be used to implement VK sparse bindings, in particular residencyNonResidentStrict property. v2: Fix BUG_ON shown in VK testing, fix check patch warning, fix xe_pt_scan_64K, update __gen8_pte_encode to understand NULL bindings, remove else if vma_addr Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Suggested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21	drm/xe: Move XE_PTE_FLAG_READ_ONLY to xe_vm_types.h	Matthew Brost
	XE_PTE_FLAG_READ_ONLY is specific to struct xe_vma, move it from xe_bo.h to xe_vm_types.h to reflect that. Reviewed-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: s/XE_PTE_READ_ONLY/XE_PTE_FLAG_READ_ONLY	Matthew Brost
	This define is for internal PTE flags rather than fields in the hardware PTEs, rename as such. This will help in an upcoming patch to avoid further confusion. Reviewed-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Memory allocations are tile-based, not GT-based	Matt Roper
	Since memory and address spaces are a tile concept rather than a GT concept, we need to plumb tile-based handling through lots of memory-related code. Note that one remaining shortcoming here that will need to be addressed before media GT support can be re-enabled is that although the address space is shared between a tile's GTs, each GT caches the PTEs independently in their own TLB and thus TLB invalidation should be handled at the GT level. v2: - Fix kunit test build. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Move VRAM from GT to tile	Matt Roper
	On platforms with VRAM, the VRAM is associated with the tile, not the GT. v2: - Unsquash the GGTT handling back into its own patch. - Fix kunit test build v3: - Tweak the "FIXME" comment to clarify that this function will be completely gone by the end of the series. (Lucas) v4: - Move a few changes that were supposed to be part of the GGTT patch back to that commit. (Gustavo) v5: - Kerneldoc parameter name fix. Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Rename GPU offset helper to reflect true usage	Michael J. Ruhl
	The _io_offset helper function is returning an offset into the GPU address space. Using the CPU address offset (io_) is not correct. Rename to reflect usage. Update to use GPU offset information. Update PT dma_offset to use the helper Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Prevent evicting for page tables	Maarten Lankhorst
	When creating page tables from xe_exec_ioctl, we may end up freeing memory we just validated. To be certain this does not happen, do not allow the current reservation to be evicted from the ioctl. Callchain: [ 109.008522] xe_bo_move_notify+0x5c/0xf0 [xe] [ 109.008548] xe_bo_move+0x90/0x510 [xe] [ 109.008573] ttm_bo_handle_move_mem+0xb7/0x170 [ttm] [ 109.008581] ttm_bo_swapout+0x15e/0x360 [ttm] [ 109.008586] ttm_device_swapout+0xc2/0x110 [ttm] [ 109.008592] ttm_global_swapout+0x47/0xc0 [ttm] [ 109.008598] ttm_tt_populate+0x7a/0x130 [ttm] [ 109.008603] ttm_bo_handle_move_mem+0x160/0x170 [ttm] [ 109.008609] ttm_bo_validate+0xe5/0x1d0 [ttm] [ 109.008614] ttm_bo_init_reserved+0xac/0x190 [ttm] [ 109.008620] __xe_bo_create_locked+0x153/0x260 [xe] [ 109.008645] xe_bo_create_locked_range+0x77/0x360 [xe] [ 109.008671] xe_bo_create_pin_map_at+0x33/0x1f0 [xe] [ 109.008695] xe_bo_create_pin_map+0x11/0x20 [xe] [ 109.008721] xe_pt_create+0x69/0xf0 [xe] [ 109.008749] xe_pt_stage_bind_entry+0x208/0x430 [xe] [ 109.008776] xe_pt_walk_range+0xe9/0x2a0 [xe] [ 109.008802] xe_pt_walk_range+0x223/0x2a0 [xe] [ 109.008828] xe_pt_walk_range+0x223/0x2a0 [xe] [ 109.008853] __xe_pt_bind_vma+0x28d/0xbd0 [xe] [ 109.008878] xe_vm_bind_vma+0xc7/0x2f0 [xe] [ 109.008904] xe_vm_rebind+0x72/0x160 [xe] [ 109.008930] xe_exec_ioctl+0x22b/0xa70 [xe] [ 109.008955] drm_ioctl_kernel+0xb9/0x150 [drm] [ 109.008972] drm_ioctl+0x210/0x430 [drm] [ 109.008988] __x64_sys_ioctl+0x85/0xb0 [ 109.008990] do_syscall_64+0x38/0x90 [ 109.008991] entry_SYSCALL_64_after_hwframe+0x72/0xdc Original warning: [ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe] ... [ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe] [ 5613.316098] Call Trace: [ 5613.318595] <TASK> [ 5613.320743] xe_exec_ioctl+0x383/0x8a0 [xe] [ 5613.325278] ? __is_insn_slot_addr+0x8e/0x110 [ 5613.329719] ? __is_insn_slot_addr+0x8e/0x110 [ 5613.334116] ? kernel_text_address+0x75/0xf0 [ 5613.338429] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 5613.343778] ? __kernel_text_address+0x9/0x40 [ 5613.348181] ? unwind_get_return_address+0x1a/0x30 [ 5613.353013] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 5613.358362] ? arch_stack_walk+0x99/0xf0 [ 5613.362329] ? rcu_read_lock_sched_held+0xb/0x70 [ 5613.366996] ? lock_acquire+0x287/0x2f0 [ 5613.370873] ? rcu_read_lock_sched_held+0xb/0x70 [ 5613.375530] ? rcu_read_lock_sched_held+0xb/0x70 [ 5613.380181] ? lock_release+0x225/0x2e0 [ 5613.384059] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe] [ 5613.389092] drm_ioctl_kernel+0xc0/0x170 [ 5613.393068] drm_ioctl+0x1b7/0x490 [ 5613.396519] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe] [ 5613.401547] ? lock_release+0x225/0x2e0 [ 5613.405432] __x64_sys_ioctl+0x8a/0xb0 [ 5613.409232] do_syscall_64+0x37/0x90 Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/239 Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Apply upper limit to sg element size	Niranjana Vishwanathapura
	The iommu_dma_map_sg() function ensures iova allocation doesn't cross dma segment boundary. It does so by padding some sg elements. This can cause overflow, ending up with sg->length being set to 0. Avoid this by halving the maximum segment size (rounded down to PAGE_SIZE). Specify maximum segment size for sg elements by using sg_alloc_table_from_pages_segment() to allocate sg_table. v2: Use correct max segment size in dma_set_max_seg_size() call Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Fix splat during error dump	Francois Dugast
	Allow xe_bo_addr without lock to print debug information, such as from xe_analyze_vm. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Cleanup page-related defines	Lucas De Marchi
	Rename the following defines to lose the GEN* prefixes since they don't make sense for xe: GEN8_PTE_SHIFT -> XE_PTE_SHIFT GEN8_PAGE_SIZE -> XE_PAGE_SIZE GEN8_PTE_MASK -> XE_PTE_MASK GEN8_PDE_SHIFT -> XE_PDE_SHIFT GEN8_PDES -> XE_PDES GEN8_PDE_MASK -> XE_PDE_MASK GEN8_64K_PTE_SHIFT -> XE_64K_PTE_SHIFT GEN8_64K_PAGE_SIZE -> XE_64K_PAGE_SIZE GEN8_64K_PTE_MASK -> XE_64K_PTE_MASK GEN8_64K_PDE_MASK -> XE_64K_PDE_MASK GEN8_PDE_PS_2M -> XE_PDE_PS_2M GEN8_PDPE_PS_1G -> XE_PDPE_PS_1G GEN8_PDE_IPS_64K -> XE_PDE_IPS_64K GEN12_GGTT_PTE_LM -> XE_GGTT_PTE_LM GEN12_USM_PPGTT_PTE_AE -> XE_USM_PPGTT_PTE_AE GEN12_PPGTT_PTE_LM -> XE_PPGTT_PTE_LM GEN12_PDE_64K -> XE_PDE_64K GEN12_PTE_PS64 -> XE_PTE_PS64 GEN8_PAGE_PRESENT -> XE_PAGE_PRESENT GEN8_PAGE_RW -> XE_PAGE_RW PTE_READ_ONLY -> XE_PTE_READ_ONLY Keep an XE_ prefix to make sure we don't mix the defines for the CPU (e.g. PAGE_SIZE) with the ones fro the GPU). Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: fix suspend-resume for dgfx	Matthew Auld
	This stopped working now that TTM treats moving a pinned object through ttm_bo_validate() as an error, for the general case. Add some new routines to handle the new special casing needed for suspend-resume. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/244 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Use proper vram offset	Niranjana Vishwanathapura
	In xe_migrate functions, use proper vram io offset of the tiles while calculating addresses. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: add XE_BO_CREATE_VRAM_MASK	Matthew Auld
	So we don't have to keep repeating VRAM0 \| VRAM1. Also if there are ever more instances, then we have less places to update. Suggested-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: Use BO's GT to determine dma_offset when programming PTEs	Matthew Brost
	Rather than using the passed in GT, use the BO's GT determine dma_offset when programming PTEs as these two GT's could differ (i.e. mapping a BO from a remote GT). The BO's GT is correct GT to use as this where BO resides, while the passed in GT is where the mapping is created. v2: (Thomas) - Kernel doc, extra new line (CI) - Rebase to tip Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19	drm/xe: s/lmem/vram/	Matthew Auld
	This seems to be the preferred nomenclature in xe. Currently we are intermixing vram and lmem, which is confusing. v2 (Gwan-gyeong Mun & Lucas): - Rather apply to the entire driver Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-12	drm/xe: Fix hidden gotcha regression with bo create	Maarten Lankhorst
	The bo_create ioctl relied on the internal ordering of memory regions to be the same, make sure we don't allocate stolen instead of VRAM0. Also remove a debug warning left in. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Philippe Lecluse <philippe.lecluse1@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-12	drm/xe: Implement stolen memory.	Maarten Lankhorst
	This adds support for stolen memory, with the same allocator as vram_mgr. This allows us to skip a whole lot of copy-paste, by re-using parts of xe_ttm_vram_mgr. The stolen memory may be bound using VM_BIND, so it performs like any other memory region. We should be able to map a stolen BO directly using the physical memory location instead of through GGTT even on old platforms, but I don't know what the effects are on coherency. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-12	drm/xe: Introduce a new DRM driver for Intel GPUs	Matthew Brost
	Xe, is a new driver for Intel GPUs that supports both integrated and discrete platforms starting with Tiger Lake (first Intel Xe Architecture). The code is at a stage where it is already functional and has experimental support for multiple platforms starting from Tiger Lake, with initial support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO (for OpenCL and Level0). The new Xe driver leverages a lot from i915. As for display, the intent is to share the display code with the i915 driver so that there is maximum reuse there. But it is not added in this patch. This initial work is a collaboration of many people and unfortunately the big squashed patch won't fully honor the proper credits. But let's get some git quick stats so we can at least try to preserve some of the credits: Co-developed-by: Matthew Brost <matthew.brost@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Co-developed-by: Francois Dugast <francois.dugast@intel.com> Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com> Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Co-developed-by: Jani Nikula <jani.nikula@intel.com> Co-developed-by: José Roberto de Souza <jose.souza@intel.com> Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Co-developed-by: Dave Airlie <airlied@redhat.com> Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com>