diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-09-01 11:26:46 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-09-01 11:26:46 -0700 |
commit | 477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6 (patch) | |
tree | 1897dd1de49e1ea24897163533e2d8ead5dad0ad /drivers/gpu/drm/amd/amdkfd | |
parent | 835d31d319d9c8c4eb6cac074643360ba0ecab10 (diff) | |
parent | 8f0284f190e6a0aa09015090568c03f18288231a (diff) |
Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- i915 has seen a lot of refactoring and uAPI cleanups due to a
change in the upstream direction going forward
This has all been audited with known userspace, but there may be
some pitfalls that were missed.
- i915 now uses common TTM to enable discrete memory on DG1/2 GPUs
- i915 enables Jasper and Elkhart Lake by default and has preliminary
XeHP/DG2 support
- amdgpu adds support for Cyan Skillfish
- lots of implicit fencing rules documented and fixed up in drivers
- msm now uses the core scheduler
- the irq midlayer has been removed for non-legacy drivers
- the sysfb code now works on more than x86.
Otherwise the usual smattering of stuff everywhere, panels, bridges,
refactorings.
Detailed summary:
core:
- extract i915 eDP backlight into core
- DP aux bus support
- drm_device.irq_enabled removed
- port drivers to native irq interfaces
- export gem shadow plane handling for vgem
- print proper driver name in framebuffer registration
- driver fixes for implicit fencing rules
- ARM fixed rate compression modifier added
- updated fb damage handling
- rmfb ioctl logging/docs
- drop drm_gem_object_put_locked
- define DRM_FORMAT_MAX_PLANES
- add gem fb vmap/vunmap helpers
- add lockdep_assert(once) helpers
- mark drm irq midlayer as legacy
- use offset adjusted bo mapping conversion
vgaarb:
- cleanups
fbdev:
- extend efifb handling to all arches
- div by 0 fixes for multiple drivers
udmabuf:
- add hugepage mapping support
dma-buf:
- non-dynamic exporter fixups
- document implicit fencing rules
amdgpu:
- Initial Cyan Skillfish support
- switch virtual DCE over to vkms based atomic
- VCN/JPEG power down fixes
- NAVI PCIE link handling fixes
- AMD HDMI freesync fixes
- Yellow Carp + Beige Goby fixes
- Clockgating/S0ix/SMU/EEPROM fixes
- embed hw fence in job
- rework dma-resv handling
- ensure eviction to system ram
amdkfd:
- uapi: SVM address range query added
- sysfs leak fix
- GPUVM TLB optimizations
- vmfault/migration counters
i915:
- Enable JSL and EHL by default
- preliminary XeHP/DG2 support
- remove all CNL support (never shipped)
- move to TTM for discrete memory support
- allow mixed object mmap handling
- GEM uAPI spring cleaning
- add I915_MMAP_OBJECT_FIXED
- reinstate ADL-P mmap ioctls
- drop a bunch of unused by userspace features
- disable and remove GPU relocations
- revert some i915 misfeatures
- major refactoring of GuC for Gen11+
- execbuffer object locking separate step
- reject caching/set-domain on discrete
- Enable pipe DMC loading on XE-LPD and ADL-P
- add PSF GV point support
- Refactor and fix DDI buffer translations
- Clean up FBC CFB allocation code
- Finish INTEL_GEN() and friends macro conversions
nouveau:
- add eDP backlight support
- implicit fence fix
msm:
- a680/7c3 support
- drm/scheduler conversion
panfrost:
- rework GPU reset
virtio:
- fix fencing for planes
ast:
- add detect support
bochs:
- move to tiny GPU driver
vc4:
- use hotplug irqs
- HDMI codec support
vmwgfx:
- use internal vmware device headers
ingenic:
- demidlayering irq
rcar-du:
- shutdown fixes
- convert to bridge connector helpers
zynqmp-dsub:
- misc fixes
mgag200:
- convert PLL handling to atomic
mediatek:
- MT8133 AAL support
- gem mmap object support
- MT8167 support
etnaviv:
- NXP Layerscape LS1028A SoC support
- GEM mmap cleanups
tegra:
- new user API
exynos:
- missing unlock fix
- build warning fix
- use refcount_t"
* tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits)
drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box
drm/amd/display: Remove duplicate dml init
drm/amd/display: Update bounding box states (v2)
drm/amd/display: Update number of DCN3 clock states
drm/amdgpu: disable GFX CGCG in aldebaran
drm/amdgpu: Clear RAS interrupt status on aldebaran
drm/amdgpu: Add support for RAS XGMI err query
drm/amdkfd: Account for SH/SE count when setting up cu masks.
drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain
drm/amdgpu: drop redundant cancel_delayed_work_sync call
drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
drm/amdkfd: map SVM range with correct access permission
drm/amdkfd: check access permisson to restore retry fault
drm/amdgpu: Update RAS XGMI Error Query
drm/amdgpu: Add driver infrastructure for MCA RAS
drm/amd/display: Add Logging for HDMI color depth information
drm/amd/amdgpu: consolidate PSP TA init shared buf functions
drm/amd/amdgpu: add name field back to ras_common_if
drm/amdgpu: Fix build with missing pm_suspend_target_state module export
...
Diffstat (limited to 'drivers/gpu/drm/amd/amdkfd')
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 47 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 17 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_device.c | 59 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 60 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 1 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 84 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 1 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 5 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 10 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h | 2 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 217 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 5 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 | ||||
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 |
17 files changed, 381 insertions, 142 deletions
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index e48acdd03c1a..86afd37b098d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1393,6 +1393,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep, long err = 0; int i; uint32_t *devices_arr = NULL; + bool table_freed = false; dev = kfd_device_by_id(GET_GPU_ID(args->handle)); if (!dev) @@ -1450,7 +1451,8 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep, goto get_mem_obj_from_handle_failed; } err = amdgpu_amdkfd_gpuvm_map_memory_to_gpu( - peer->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv); + peer->kgd, (struct kgd_mem *)mem, + peer_pdd->drm_priv, &table_freed); if (err) { pr_err("Failed to map to gpu %d/%d\n", i, args->n_devices); @@ -1468,16 +1470,17 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep, } /* Flush TLBs after waiting for the page table updates to complete */ - for (i = 0; i < args->n_devices; i++) { - peer = kfd_device_by_id(devices_arr[i]); - if (WARN_ON_ONCE(!peer)) - continue; - peer_pdd = kfd_get_process_device_data(peer, p); - if (WARN_ON_ONCE(!peer_pdd)) - continue; - kfd_flush_tlb(peer_pdd, TLB_FLUSH_LEGACY); + if (table_freed) { + for (i = 0; i < args->n_devices; i++) { + peer = kfd_device_by_id(devices_arr[i]); + if (WARN_ON_ONCE(!peer)) + continue; + peer_pdd = kfd_get_process_device_data(peer, p); + if (WARN_ON_ONCE(!peer_pdd)) + continue; + kfd_flush_tlb(peer_pdd, TLB_FLUSH_LEGACY); + } } - kfree(devices_arr); return err; @@ -1565,10 +1568,29 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, } args->n_success = i+1; } - kfree(devices_arr); - mutex_unlock(&p->mutex); + if (dev->device_info->asic_family == CHIP_ALDEBARAN) { + err = amdgpu_amdkfd_gpuvm_sync_memory(dev->kgd, + (struct kgd_mem *) mem, true); + if (err) { + pr_debug("Sync memory failed, wait interrupted by user signal\n"); + goto sync_memory_failed; + } + + /* Flush TLBs after waiting for the page table updates to complete */ + for (i = 0; i < args->n_devices; i++) { + peer = kfd_device_by_id(devices_arr[i]); + if (WARN_ON_ONCE(!peer)) + continue; + peer_pdd = kfd_get_process_device_data(peer, p); + if (WARN_ON_ONCE(!peer_pdd)) + continue; + kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT); + } + } + kfree(devices_arr); + return 0; bind_process_to_device_failed: @@ -1576,6 +1598,7 @@ get_mem_obj_from_handle_failed: unmap_memory_from_gpu_failed: mutex_unlock(&p->mutex); copy_from_user_failed: +sync_memory_failed: kfree(devices_arr); return err; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index c6b02aee4993..cfedfb1e8596 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1404,6 +1404,7 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, break; case CHIP_NAVI10: case CHIP_NAVI12: + case CHIP_CYAN_SKILLFISH: pcache_info = navi10_cache_info; num_of_cache_types = ARRAY_SIZE(navi10_cache_info); break; @@ -1989,8 +1990,19 @@ static int kfd_fill_gpu_direct_io_link_to_cpu(int *avail_size, sub_type_hdr->flags |= CRAT_IOLINK_FLAGS_BI_DIRECTIONAL; sub_type_hdr->io_interface_type = CRAT_IOLINK_TYPE_XGMI; sub_type_hdr->num_hops_xgmi = 1; + if (adev->asic_type == CHIP_ALDEBARAN) { + sub_type_hdr->minimum_bandwidth_mbs = + amdgpu_amdkfd_get_xgmi_bandwidth_mbytes( + kdev->kgd, NULL, true); + sub_type_hdr->maximum_bandwidth_mbs = + sub_type_hdr->minimum_bandwidth_mbs; + } } else { sub_type_hdr->io_interface_type = CRAT_IOLINK_TYPE_PCIEXPRESS; + sub_type_hdr->minimum_bandwidth_mbs = + amdgpu_amdkfd_get_pcie_bandwidth_mbytes(kdev->kgd, true); + sub_type_hdr->maximum_bandwidth_mbs = + amdgpu_amdkfd_get_pcie_bandwidth_mbytes(kdev->kgd, false); } sub_type_hdr->proximity_domain_from = proximity_domain; @@ -2033,6 +2045,11 @@ static int kfd_fill_gpu_xgmi_link_to_gpu(int *avail_size, sub_type_hdr->proximity_domain_to = proximity_domain_to; sub_type_hdr->num_hops_xgmi = amdgpu_amdkfd_get_xgmi_hops_count(kdev->kgd, peer_kdev->kgd); + sub_type_hdr->maximum_bandwidth_mbs = + amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(kdev->kgd, peer_kdev->kgd, false); + sub_type_hdr->minimum_bandwidth_mbs = sub_type_hdr->maximum_bandwidth_mbs ? + amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(kdev->kgd, NULL, true) : 0; + return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 6b57dfd2cd2a..16a57b70cc1a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -84,12 +84,14 @@ static const struct kfd2kgd_calls *kfd2kgd_funcs[] = { [CHIP_DIMGREY_CAVEFISH] = &gfx_v10_3_kfd2kgd, [CHIP_BEIGE_GOBY] = &gfx_v10_3_kfd2kgd, [CHIP_YELLOW_CARP] = &gfx_v10_3_kfd2kgd, + [CHIP_CYAN_SKILLFISH] = &gfx_v10_kfd2kgd, }; #ifdef KFD_SUPPORT_IOMMU_V2 static const struct kfd_device_info kaveri_device_info = { .asic_family = CHIP_KAVERI, .asic_name = "kaveri", + .gfx_target_version = 70000, .max_pasid_bits = 16, /* max num of queues for KV.TODO should be a dynamic value */ .max_no_of_hqd = 24, @@ -109,6 +111,7 @@ static const struct kfd_device_info kaveri_device_info = { static const struct kfd_device_info carrizo_device_info = { .asic_family = CHIP_CARRIZO, .asic_name = "carrizo", + .gfx_target_version = 80001, .max_pasid_bits = 16, /* max num of queues for CZ.TODO should be a dynamic value */ .max_no_of_hqd = 24, @@ -129,6 +132,7 @@ static const struct kfd_device_info carrizo_device_info = { static const struct kfd_device_info raven_device_info = { .asic_family = CHIP_RAVEN, .asic_name = "raven", + .gfx_target_version = 90002, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -147,6 +151,7 @@ static const struct kfd_device_info raven_device_info = { static const struct kfd_device_info hawaii_device_info = { .asic_family = CHIP_HAWAII, .asic_name = "hawaii", + .gfx_target_version = 70001, .max_pasid_bits = 16, /* max num of queues for KV.TODO should be a dynamic value */ .max_no_of_hqd = 24, @@ -166,6 +171,7 @@ static const struct kfd_device_info hawaii_device_info = { static const struct kfd_device_info tonga_device_info = { .asic_family = CHIP_TONGA, .asic_name = "tonga", + .gfx_target_version = 80002, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -184,6 +190,7 @@ static const struct kfd_device_info tonga_device_info = { static const struct kfd_device_info fiji_device_info = { .asic_family = CHIP_FIJI, .asic_name = "fiji", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -202,6 +209,7 @@ static const struct kfd_device_info fiji_device_info = { static const struct kfd_device_info fiji_vf_device_info = { .asic_family = CHIP_FIJI, .asic_name = "fiji", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -221,6 +229,7 @@ static const struct kfd_device_info fiji_vf_device_info = { static const struct kfd_device_info polaris10_device_info = { .asic_family = CHIP_POLARIS10, .asic_name = "polaris10", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -239,6 +248,7 @@ static const struct kfd_device_info polaris10_device_info = { static const struct kfd_device_info polaris10_vf_device_info = { .asic_family = CHIP_POLARIS10, .asic_name = "polaris10", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -257,6 +267,7 @@ static const struct kfd_device_info polaris10_vf_device_info = { static const struct kfd_device_info polaris11_device_info = { .asic_family = CHIP_POLARIS11, .asic_name = "polaris11", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -275,6 +286,7 @@ static const struct kfd_device_info polaris11_device_info = { static const struct kfd_device_info polaris12_device_info = { .asic_family = CHIP_POLARIS12, .asic_name = "polaris12", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -293,6 +305,7 @@ static const struct kfd_device_info polaris12_device_info = { static const struct kfd_device_info vegam_device_info = { .asic_family = CHIP_VEGAM, .asic_name = "vegam", + .gfx_target_version = 80003, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 4, @@ -311,6 +324,7 @@ static const struct kfd_device_info vegam_device_info = { static const struct kfd_device_info vega10_device_info = { .asic_family = CHIP_VEGA10, .asic_name = "vega10", + .gfx_target_version = 90000, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -329,6 +343,7 @@ static const struct kfd_device_info vega10_device_info = { static const struct kfd_device_info vega10_vf_device_info = { .asic_family = CHIP_VEGA10, .asic_name = "vega10", + .gfx_target_version = 90000, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -347,6 +362,7 @@ static const struct kfd_device_info vega10_vf_device_info = { static const struct kfd_device_info vega12_device_info = { .asic_family = CHIP_VEGA12, .asic_name = "vega12", + .gfx_target_version = 90004, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -365,6 +381,7 @@ static const struct kfd_device_info vega12_device_info = { static const struct kfd_device_info vega20_device_info = { .asic_family = CHIP_VEGA20, .asic_name = "vega20", + .gfx_target_version = 90006, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -383,6 +400,7 @@ static const struct kfd_device_info vega20_device_info = { static const struct kfd_device_info arcturus_device_info = { .asic_family = CHIP_ARCTURUS, .asic_name = "arcturus", + .gfx_target_version = 90008, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -401,6 +419,7 @@ static const struct kfd_device_info arcturus_device_info = { static const struct kfd_device_info aldebaran_device_info = { .asic_family = CHIP_ALDEBARAN, .asic_name = "aldebaran", + .gfx_target_version = 90010, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -419,6 +438,7 @@ static const struct kfd_device_info aldebaran_device_info = { static const struct kfd_device_info renoir_device_info = { .asic_family = CHIP_RENOIR, .asic_name = "renoir", + .gfx_target_version = 90002, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -437,6 +457,7 @@ static const struct kfd_device_info renoir_device_info = { static const struct kfd_device_info navi10_device_info = { .asic_family = CHIP_NAVI10, .asic_name = "navi10", + .gfx_target_version = 100100, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -455,6 +476,7 @@ static const struct kfd_device_info navi10_device_info = { static const struct kfd_device_info navi12_device_info = { .asic_family = CHIP_NAVI12, .asic_name = "navi12", + .gfx_target_version = 100101, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -473,6 +495,7 @@ static const struct kfd_device_info navi12_device_info = { static const struct kfd_device_info navi14_device_info = { .asic_family = CHIP_NAVI14, .asic_name = "navi14", + .gfx_target_version = 100102, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -491,6 +514,7 @@ static const struct kfd_device_info navi14_device_info = { static const struct kfd_device_info sienna_cichlid_device_info = { .asic_family = CHIP_SIENNA_CICHLID, .asic_name = "sienna_cichlid", + .gfx_target_version = 100300, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -509,6 +533,7 @@ static const struct kfd_device_info sienna_cichlid_device_info = { static const struct kfd_device_info navy_flounder_device_info = { .asic_family = CHIP_NAVY_FLOUNDER, .asic_name = "navy_flounder", + .gfx_target_version = 100301, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -527,6 +552,7 @@ static const struct kfd_device_info navy_flounder_device_info = { static const struct kfd_device_info vangogh_device_info = { .asic_family = CHIP_VANGOGH, .asic_name = "vangogh", + .gfx_target_version = 100303, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -545,6 +571,7 @@ static const struct kfd_device_info vangogh_device_info = { static const struct kfd_device_info dimgrey_cavefish_device_info = { .asic_family = CHIP_DIMGREY_CAVEFISH, .asic_name = "dimgrey_cavefish", + .gfx_target_version = 100302, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -563,6 +590,7 @@ static const struct kfd_device_info dimgrey_cavefish_device_info = { static const struct kfd_device_info beige_goby_device_info = { .asic_family = CHIP_BEIGE_GOBY, .asic_name = "beige_goby", + .gfx_target_version = 100304, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -581,6 +609,7 @@ static const struct kfd_device_info beige_goby_device_info = { static const struct kfd_device_info yellow_carp_device_info = { .asic_family = CHIP_YELLOW_CARP, .asic_name = "yellow_carp", + .gfx_target_version = 100305, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, @@ -596,6 +625,25 @@ static const struct kfd_device_info yellow_carp_device_info = { .num_sdma_queues_per_engine = 2, }; +static const struct kfd_device_info cyan_skillfish_device_info = { + .asic_family = CHIP_CYAN_SKILLFISH, + .asic_name = "cyan_skillfish", + .gfx_target_version = 100103, + .max_pasid_bits = 16, + .max_no_of_hqd = 24, + .doorbell_size = 8, + .ih_ring_entry_size = 8 * sizeof(uint32_t), + .event_interrupt_class = &event_interrupt_class_v9, + .num_of_watch_points = 4, + .mqd_size_aligned = MQD_SIZE_ALIGNED, + .needs_iommu_device = false, + .supports_cwsr = true, + .needs_pci_atomics = true, + .num_sdma_engines = 2, + .num_xgmi_sdma_engines = 0, + .num_sdma_queues_per_engine = 8, +}; + /* For each entry, [0] is regular and [1] is virtualisation device. */ static const struct kfd_device_info *kfd_supported_devices[][2] = { #ifdef KFD_SUPPORT_IOMMU_V2 @@ -625,6 +673,7 @@ static const struct kfd_device_info *kfd_supported_devices[][2] = { [CHIP_DIMGREY_CAVEFISH] = {&dimgrey_cavefish_device_info, &dimgrey_cavefish_device_info}, [CHIP_BEIGE_GOBY] = {&beige_goby_device_info, &beige_goby_device_info}, [CHIP_YELLOW_CARP] = {&yellow_carp_device_info, NULL}, + [CHIP_CYAN_SKILLFISH] = {&cyan_skillfish_device_info, NULL}, }; static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size, @@ -1369,7 +1418,7 @@ void kfd_dec_compute_active(struct kfd_dev *kfd) WARN_ONCE(count < 0, "Compute profile ref. count error"); } -void kgd2kfd_smi_event_throttle(struct kfd_dev *kfd, uint32_t throttle_bitmask) +void kgd2kfd_smi_event_throttle(struct kfd_dev *kfd, uint64_t throttle_bitmask) { if (kfd && kfd->init_complete) kfd_smi_event_update_thermal_throttling(kfd, throttle_bitmask); @@ -1382,18 +1431,12 @@ void kgd2kfd_smi_event_throttle(struct kfd_dev *kfd, uint32_t throttle_bitmask) */ int kfd_debugfs_hang_hws(struct kfd_dev *dev) { - int r = 0; - if (dev->dqm->sched_policy != KFD_SCHED_POLICY_HWS) { pr_err("HWS is not enabled"); return -EINVAL; } - r = pm_debugfs_hang_hws(&dev->dqm->packets); - if (!r) - r = dqm_debugfs_execute_queues(dev->dqm); - - return r; + return dqm_debugfs_hang_hws(dev->dqm); } #endif diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 16a1713808c2..f8fce9d05f50 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -211,6 +211,15 @@ static void deallocate_doorbell(struct qcm_process_device *qpd, WARN_ON(!old); } +static void program_trap_handler_settings(struct device_queue_manager *dqm, + struct qcm_process_device *qpd) +{ + if (dqm->dev->kfd2kgd->program_trap_handler_settings) + dqm->dev->kfd2kgd->program_trap_handler_settings( + dqm->dev->kgd, qpd->vmid, + qpd->tba_addr, qpd->tma_addr); +} + static int allocate_vmid(struct device_queue_manager *dqm, struct qcm_process_device *qpd, struct queue *q) @@ -241,6 +250,10 @@ static int allocate_vmid(struct device_queue_manager *dqm, program_sh_mem_settings(dqm, qpd); + if (dqm->dev->device_info->asic_family >= CHIP_VEGA10 && + dqm->dev->cwsr_enabled) + program_trap_handler_settings(dqm, qpd); + /* qpd->page_table_base is set earlier when register_process() * is called, i.e. when the first queue is created. */ @@ -260,7 +273,7 @@ static int allocate_vmid(struct device_queue_manager *dqm, static int flush_texture_cache_nocpsch(struct kfd_dev *kdev, struct qcm_process_device *qpd) { - const struct packet_manager_funcs *pmf = qpd->dqm->packets.pmf; + const struct packet_manager_funcs *pmf = qpd->dqm->packet_mgr.pmf; int ret; if (!qpd->ib_kaddr) @@ -582,7 +595,9 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q) } retval = mqd_mgr->destroy_mqd(mqd_mgr, q->mqd, - KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN, + (dqm->dev->cwsr_enabled? + KFD_PREEMPT_TYPE_WAVEFRONT_SAVE: + KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN), KFD_UNMAP_LATENCY_MS, q->pipe, q->queue); if (retval) { pr_err("destroy mqd failed\n"); @@ -675,7 +690,9 @@ static int evict_process_queues_nocpsch(struct device_queue_manager *dqm, continue; retval = mqd_mgr->destroy_mqd(mqd_mgr, q->mqd, - KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN, + (dqm->dev->cwsr_enabled? + KFD_PREEMPT_TYPE_WAVEFRONT_SAVE: + KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN), KFD_UNMAP_LATENCY_MS, q->pipe, q->queue); if (retval && !ret) /* Return the first error, but keep going to @@ -1000,7 +1017,7 @@ static int start_nocpsch(struct device_queue_manager *dqm) init_interrupts(dqm); if (dqm->dev->device_info->asic_family == CHIP_HAWAII) - return pm_init(&dqm->packets, dqm); + return pm_init(&dqm->packet_mgr, dqm); dqm->sched_running = true; return 0; @@ -1009,7 +1026,7 @@ static int start_nocpsch(struct device_queue_manager *dqm) static int stop_nocpsch(struct device_queue_manager *dqm) { if (dqm->dev->device_info->asic_family == CHIP_HAWAII) - pm_uninit(&dqm->packets, false); + pm_uninit(&dqm->packet_mgr, false); dqm->sched_running = false; return 0; @@ -1124,7 +1141,7 @@ static int set_sched_resources(struct device_queue_manager *dqm) "queue mask: 0x%8llX\n", res.vmid_mask, res.queue_mask); - return pm_send_set_resources(&dqm->packets, &res); + return pm_send_set_resources(&dqm->packet_mgr, &res); } static int initialize_cpsch(struct device_queue_manager *dqm) @@ -1164,7 +1181,8 @@ static int start_cpsch(struct device_queue_manager *dqm) retval = 0; - retval = pm_init(&dqm->packets, dqm); + dqm_lock(dqm); + retval = pm_init(&dqm->packet_mgr, dqm); if (retval) goto fail_packet_manager_init; @@ -1186,7 +1204,6 @@ static int start_cpsch(struct device_queue_manager *dqm) init_interrupts(dqm); - dqm_lock(dqm); /* clear hang status when driver try to start the hw scheduler */ dqm->is_hws_hang = false; dqm->is_resetting = false; @@ -1197,8 +1214,9 @@ static int start_cpsch(struct device_queue_manager *dqm) return 0; fail_allocate_vidmem: fail_set_sched_resources: - pm_uninit(&dqm->packets, false); + pm_uninit(&dqm->packet_mgr, false); fail_packet_manager_init: + dqm_unlock(dqm); return retval; } @@ -1211,12 +1229,12 @@ static int stop_cpsch(struct device_queue_manager *dqm) unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); hanging = dqm->is_hws_hang || dqm->is_resetting; dqm->sched_running = false; - dqm_unlock(dqm); - pm_release_ib(&dqm->packets); + pm_release_ib(&dqm->packet_mgr); kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); - pm_uninit(&dqm->packets, hanging); + pm_uninit(&dqm->packet_mgr, hanging); + dqm_unlock(dqm); return 0; } @@ -1390,7 +1408,7 @@ static int map_queues_cpsch(struct device_queue_manager *dqm) if (dqm->active_runlist) return 0; - retval = pm_send_runlist(&dqm->packets, &dqm->queues); + retval = pm_send_runlist(&dqm->packet_mgr, &dqm->queues); pr_debug("%s sent runlist\n", __func__); if (retval) { pr_err("failed to execute runlist\n"); @@ -1416,13 +1434,13 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm, if (!dqm->active_runlist) return retval; - retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_COMPUTE, + retval = pm_send_unmap_queue(&dqm->packet_mgr, KFD_QUEUE_TYPE_COMPUTE, filter, filter_param, false, 0); if (retval) return retval; *dqm->fence_addr = KFD_FENCE_INIT; - pm_send_query_status(&dqm->packets, dqm->fence_gpu_addr, + pm_send_query_status(&dqm->packet_mgr, dqm->fence_gpu_addr, KFD_FENCE_COMPLETED); /* should be timed out */ retval = amdkfd_fence_wait_timeout(dqm->fence_addr, KFD_FENCE_COMPLETED, @@ -1448,14 +1466,14 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm, * check those fields */ mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ]; - if (mqd_mgr->read_doorbell_id(dqm->packets.priv_queue->queue->mqd)) { + if (mqd_mgr->read_doorbell_id(dqm->packet_mgr.priv_queue->queue->mqd)) { pr_err("HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out\n"); while (halt_if_hws_hang) schedule(); return -ETIME; } - pm_release_ib(&dqm->packets); + pm_release_ib(&dqm->packet_mgr); dqm->active_runlist = false; return retval; @@ -1946,6 +1964,7 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev) case CHIP_DIMGREY_CAVEFISH: case CHIP_BEIGE_GOBY: case CHIP_YELLOW_CARP: + case CHIP_CYAN_SKILLFISH: device_queue_manager_init_v10_navi10(&dqm->asic_ops); break; default: @@ -2099,11 +2118,16 @@ int dqm_debugfs_hqds(struct seq_file *m, void *data) return r; } -int dqm_debugfs_execute_queues(struct device_queue_manager *dqm) +int dqm_debugfs_hang_hws(struct device_queue_manager *dqm) { int r = 0; dqm_lock(dqm); + r = pm_debugfs_hang_hws(&dqm->packet_mgr); + if (r) { + dqm_unlock(dqm); + return r; + } dqm->active_runlist = true; r = execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); dqm_unlock(dqm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 71e2fde56b2b..c8719682c4da 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -169,7 +169,7 @@ struct device_queue_manager { struct device_queue_manager_asic_ops asic_ops; struct mqd_manager *mqd_mgrs[KFD_MQD_TYPE_MAX]; - struct packet_manager packets; + struct packet_manager packet_mgr; struct kfd_dev *dev; struct mutex lock_hidden; /* use dqm_lock/unlock(dqm) */ struct list_head queues; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c index a9b329f0f862..2e86692def19 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c @@ -422,6 +422,7 @@ int kfd_init_apertures(struct kfd_process *process) case CHIP_DIMGREY_CAVEFISH: case CHIP_BEIGE_GOBY: case CHIP_YELLOW_CARP: + case CHIP_CYAN_SKILLFISH: kfd_init_apertures_v9(pdd, id); break; default: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c index 88813dad731f..c021519af810 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c @@ -98,36 +98,78 @@ void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm, uint32_t *se_mask) { struct kfd_cu_info cu_info; - uint32_t cu_per_se[KFD_MAX_NUM_SE] = {0}; - int i, se, sh, cu = 0; - + uint32_t cu_per_sh[KFD_MAX_NUM_SE][KFD_MAX_NUM_SH_PER_SE] = {0}; + int i, se, sh, cu; amdgpu_amdkfd_get_cu_info(mm->dev->kgd, &cu_info); if (cu_mask_count > cu_info.cu_active_number) cu_mask_count = cu_info.cu_active_number; + /* Exceeding these bounds corrupts the stack and indicates a coding error. + * Returning with no CU's enabled will hang the queue, which should be + * attention grabbing. + */ + if (cu_info.num_shader_engines > KFD_MAX_NUM_SE) { + pr_err("Exceeded KFD_MAX_NUM_SE, chip reports %d\n", cu_info.num_shader_engines); + return; + } + if (cu_info.num_shader_arrays_per_engine > KFD_MAX_NUM_SH_PER_SE) { + pr_err("Exceeded KFD_MAX_NUM_SH, chip reports %d\n", + cu_info.num_shader_arrays_per_engine * cu_info.num_shader_engines); + return; + } + /* Count active CUs per SH. + * + * Some CUs in an SH may be disabled. HW expects disabled CUs to be + * represented in the high bits of each SH's enable mask (the upper and lower + * 16 bits of se_mask) and will take care of the actual distribution of + * disabled CUs within each SH automatically. + * Each half of se_mask must be filled only on bits 0-cu_per_sh[se][sh]-1. + * + * See note on Arcturus cu_bitmap layout in gfx_v9_0_get_cu_info. + */ for (se = 0; se < cu_info.num_shader_engines; se++) for (sh = 0; sh < cu_info.num_shader_arrays_per_engine; sh++) - cu_per_se[se] += hweight32(cu_info.cu_bitmap[se % 4][sh + (se / 4)]); - - /* Symmetrically map cu_mask to all SEs: - * cu_mask[0] bit0 -> se_mask[0] bit0; - * cu_mask[0] bit1 -> se_mask[1] bit0; - * ... (if # SE is 4) - * cu_mask[0] bit4 -> se_mask[0] bit1; + cu_per_sh[se][sh] = hweight32(cu_info.cu_bitmap[se % 4][sh + (se / 4)]); + + /* Symmetrically map cu_mask to all SEs & SHs: + * se_mask programs up to 2 SH in the upper and lower 16 bits. + * + * Examples + * Assuming 1 SH/SE, 4 SEs: + * cu_mask[0] bit0 -> se_mask[0] bit0 + * cu_mask[0] bit1 -> se_mask[1] bit0 + * ... + * cu_mask[0] bit4 -> se_mask[0] bit1 + * ... + * + * Assuming 2 SH/SE, 4 SEs + * cu_mask[0] bit0 -> se_mask[0] bit0 (SE0,SH0,CU0) + * cu_mask[0] bit1 -> se_mask[1] bit0 (SE1,SH0,CU0) + * ... + * cu_mask[0] bit4 -> se_mask[0] bit16 (SE0,SH1,CU0) + * cu_mask[0] bit5 -> se_mask[1] bit16 (SE1,SH1,CU0) + * ... + * cu_mask[0] bit8 -> se_mask[0] bit1 (SE0,SH0,CU1) * ... + * + * First ensure all CUs are disabled, then enable user specified CUs. */ - se = 0; - for (i = 0; i < cu_mask_count; i++) { - if (cu_mask[i / 32] & (1 << (i % 32))) - se_mask[se] |= 1 << cu; - - do { - se++; - if (se == cu_info.num_shader_engines) { - se = 0; - cu++; + for (i = 0; i < cu_info.num_shader_engines; i++) + se_mask[i] = 0; + + i = 0; + for (cu = 0; cu < 16; cu++) { + for (sh = 0; sh < cu_info.num_shader_arrays_per_engine; sh++) { + for (se = 0; se < cu_info.num_shader_engines; se++) { + if (cu_per_sh[se][sh] > cu) { + if (cu_mask[i / 32] & (1 << (i % 32))) + se_mask[se] |= 1 << (cu + sh * 16); + i++; + if (i == cu_mask_count) + return; + } } - } while (cu >= cu_per_se[se] && cu < 32); + } } } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h index b5e2ea7550d4..6e6918ccedfd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h @@ -27,6 +27,7 @@ #include "kfd_priv.h" #define KFD_MAX_NUM_SE 8 +#define KFD_MAX_NUM_SH_PER_SE 2 /** * struct mqd_manager diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c index d8e940f03102..e547f1f8c49f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -251,6 +251,7 @@ int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm) case CHIP_DIMGREY_CAVEFISH: case CHIP_BEIGE_GOBY: case CHIP_YELLOW_CARP: + case CHIP_CYAN_SKILLFISH: pm->pmf = &kfd_v9_pm_funcs; break; case CHIP_ALDEBARAN: @@ -278,6 +279,7 @@ void pm_uninit(struct packet_manager *pm, bool hanging) { mutex_destroy(&pm->lock); kernel_queue_uninit(pm->priv_queue, hanging); + pm->priv_queue = NULL; } int pm_send_set_resources(struct packet_manager *pm, @@ -447,6 +449,9 @@ int pm_debugfs_hang_hws(struct packet_manager *pm) uint32_t *buffer, size; int r = 0; + if (!pm->priv_queue) + return -EAGAIN; + size = pm->pmf->query_status_size; mutex_lock(&pm->lock); kq_acquire_packet_buffer(pm->priv_queue, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 3426743ed228..ab83b0de6b22 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -196,6 +196,7 @@ struct kfd_event_interrupt_class { struct kfd_device_info { enum amd_asic_type asic_family; const char *asic_name; + uint32_t gfx_target_version; const struct kfd_event_interrupt_class *event_interrupt_class; unsigned int max_pasid_bits; unsigned int max_no_of_hqd; @@ -1194,7 +1195,7 @@ int pm_debugfs_runlist(struct seq_file *m, void *data); int kfd_debugfs_hang_hws(struct kfd_dev *dev); int pm_debugfs_hang_hws(struct packet_manager *pm); -int dqm_debugfs_execute_queues(struct device_queue_manager *dqm); +int dqm_debugfs_hang_hws(struct device_queue_manager *dqm); #else diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 8a2c6fc438c0..21ec8a18cad2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -714,7 +714,8 @@ static int kfd_process_alloc_gpuvm(struct kfd_process_device *pdd, if (err) goto err_alloc_mem; - err = amdgpu_amdkfd_gpuvm_map_memory_to_gpu(kdev->kgd, mem, pdd->drm_priv); + err = amdgpu_amdkfd_gpuvm_map_memory_to_gpu(kdev->kgd, mem, + pdd->drm_priv, NULL); if (err) goto err_map_mem; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c index 246522423559..ed4bc5f844ce 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c @@ -205,23 +205,23 @@ void kfd_smi_event_update_gpu_reset(struct kfd_dev *dev, bool post_reset) } void kfd_smi_event_update_thermal_throttling(struct kfd_dev *dev, - uint32_t throttle_bitmask) + uint64_t throttle_bitmask) { struct amdgpu_device *adev = (struct amdgpu_device *)dev->kgd; /* * ThermalThrottle msg = throttle_bitmask(8): * thermal_interrupt_count(16): - * 1 byte event + 1 byte space + 8 byte throttle_bitmask + + * 1 byte event + 1 byte space + 16 byte throttle_bitmask + * 1 byte : + 16 byte thermal_interupt_counter + 1 byte \n + - * 1 byte \0 = 29 + * 1 byte \0 = 37 */ - char fifo_in[29]; + char fifo_in[37]; int len; if (list_empty(&dev->smi_clients)) return; - len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n", + len = snprintf(fifo_in, sizeof(fifo_in), "%x %llx:%llx\n", KFD_SMI_EVENT_THERMAL_THROTTLE, throttle_bitmask, atomic64_read(&adev->smu.throttle_int_counter)); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h index b9b0438202e2..bffd0c32b060 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h @@ -26,7 +26,7 @@ int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd); void kfd_smi_event_update_vmfault(struct kfd_dev *dev, uint16_t pasid); void kfd_smi_event_update_thermal_throttling(struct kfd_dev *dev, - uint32_t throttle_bitmask); + uint64_t throttle_bitmask); void kfd_smi_event_update_gpu_reset(struct kfd_dev *dev, bool post_reset); #endif diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index e883731c3f8f..491373fcdb38 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -120,6 +120,7 @@ static void svm_range_remove_notifier(struct svm_range *prange) static int svm_range_dma_map_dev(struct amdgpu_device *adev, struct svm_range *prange, + unsigned long offset, unsigned long npages, unsigned long *hmm_pfns, uint32_t gpuidx) { enum dma_data_direction dir = DMA_BIDIRECTIONAL; @@ -136,7 +137,8 @@ svm_range_dma_map_dev(struct amdgpu_device *adev, struct svm_range *prange, prange->dma_addr[gpuidx] = addr; } - for (i = 0; i < prange->npages; i++) { + addr += offset; + for (i = 0; i < npages; i++) { if (WARN_ONCE(addr[i] && !dma_mapping_error(dev, addr[i]), "leaking dma mapping\n")) dma_unmap_page(dev, addr[i], PAGE_SIZE, dir); @@ -167,6 +169,7 @@ svm_range_dma_map_dev(struct amdgpu_device *adev, struct svm_range *prange, static int svm_range_dma_map(struct svm_range *prange, unsigned long *bitmap, + unsigned long offset, unsigned long npages, unsigned long *hmm_pfns) { struct kfd_process *p; @@ -187,7 +190,8 @@ svm_range_dma_map(struct svm_range *prange, unsigned long *bitmap, } adev = (struct amdgpu_device *)pdd->dev->kgd; - r = svm_range_dma_map_dev(adev, prange, hmm_pfns, gpuidx); + r = svm_range_dma_map_dev(adev, prange, offset, npages, + hmm_pfns, gpuidx); if (r) break; } @@ -1088,11 +1092,6 @@ svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange, pte_flags |= snoop ? AMDGPU_PTE_SNOOPED : 0; pte_flags |= amdgpu_gem_va_map_flags(adev, mapping_flags); - - pr_debug("svms 0x%p [0x%lx 0x%lx] vram %d PTE 0x%llx mapping 0x%x\n", - prange->svms, prange->start, prange->last, - (domain == SVM_RANGE_VRAM_DOMAIN) ? 1:0, pte_flags, mapping_flags); - return pte_flags; } @@ -1156,7 +1155,8 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start, static int svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, - struct svm_range *prange, dma_addr_t *dma_addr, + struct svm_range *prange, unsigned long offset, + unsigned long npages, bool readonly, dma_addr_t *dma_addr, struct amdgpu_device *bo_adev, struct dma_fence **fence) { struct amdgpu_bo_va bo_va; @@ -1167,14 +1167,15 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, int r = 0; int64_t i; - pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms, prange->start, - prange->last); + last_start = prange->start + offset; + + pr_debug("svms 0x%p [0x%lx 0x%lx] readonly %d\n", prange->svms, + last_start, last_start + npages - 1, readonly); if (prange->svm_bo && prange->ttm_res) bo_va.is_xgmi = amdgpu_xgmi_same_hive(adev, bo_adev); - last_start = prange->start; - for (i = 0; i < prange->npages; i++) { + for (i = offset; i < offset + npages; i++) { last_domain = dma_addr[i] & SVM_RANGE_VRAM_DOMAIN; dma_addr[i] &= ~SVM_RANGE_VRAM_DOMAIN; if ((prange->start + i) < prange->last && @@ -1183,13 +1184,21 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, pr_debug("Mapping range [0x%lx 0x%llx] on domain: %s\n", last_start, prange->start + i, last_domain ? "GPU" : "CPU"); + pte_flags = svm_range_get_pte_flags(adev, prange, last_domain); - r = amdgpu_vm_bo_update_mapping(adev, bo_adev, vm, false, false, NULL, - last_start, + if (readonly) + pte_flags &= ~AMDGPU_PTE_WRITEABLE; + + pr_debug("svms 0x%p map [0x%lx 0x%llx] vram %d PTE 0x%llx\n", + prange->svms, last_start, prange->start + i, + (last_domain == SVM_RANGE_VRAM_DOMAIN) ? 1 : 0, + pte_flags); + + r = amdgpu_vm_bo_update_mapping(adev, bo_adev, vm, false, false, + NULL, last_start, prange->start + i, pte_flags, last_start - prange->start, - NULL, - dma_addr, + NULL, dma_addr, &vm->last_update, &table_freed); if (r) { @@ -1220,8 +1229,10 @@ out: return r; } -static int svm_range_map_to_gpus(struct svm_range *prange, - unsigned long *bitmap, bool wait) +static int +svm_range_map_to_gpus(struct svm_range *prange, unsigned long offset, + unsigned long npages, bool readonly, + unsigned long *bitmap, bool wait) { struct kfd_process_device *pdd; struct amdgpu_device *bo_adev; @@ -1257,7 +1268,8 @@ static int svm_range_map_to_gpus(struct svm_range *prange, } r = svm_range_map_to_gpu(adev, drm_priv_to_vm(pdd->drm_priv), - prange, prange->dma_addr[gpuidx], + prange, offset, npages, readonly, + prange->dma_addr[gpuidx], bo_adev, wait ? &fence : NULL); if (r) break; @@ -1390,7 +1402,7 @@ static int svm_range_validate_and_map(struct mm_struct *mm, int32_t gpuidx, bool intr, bool wait) { struct svm_validate_context ctx; - struct hmm_range *hmm_range; + unsigned long start, end, addr; struct kfd_process *p; void *owner; int32_t idx; @@ -1448,40 +1460,66 @@ static int svm_range_validate_and_map(struct mm_struct *mm, break; } } - r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, - prange->start << PAGE_SHIFT, - prange->npages, &hmm_range, - false, true, owner); - if (r) { - pr_debug("failed %d to get svm range pages\n", r); - goto unreserve_out; - } - r = svm_range_dma_map(prange, ctx.bitmap, - hmm_range->hmm_pfns); - if (r) { - pr_debug("failed %d to dma map range\n", r); - goto unreserve_out; - } + start = prange->start << PAGE_SHIFT; + end = (prange->last + 1) << PAGE_SHIFT; + for (addr = start; addr < end && !r; ) { + struct hmm_range *hmm_range; + struct vm_area_struct *vma; + unsigned long next; + unsigned long offset; + unsigned long npages; + bool readonly; - prange->validated_once = true; + vma = find_vma(mm, addr); + if (!vma || addr < vma->vm_start) { + r = -EFAULT; + goto unreserve_out; + } + readonly = !(vma->vm_flags & VM_WRITE); - svm_range_lock(prange); - if (amdgpu_hmm_range_get_pages_done(hmm_range)) { - pr_debug("hmm update the range, need validate again\n"); - r = -EAGAIN; - goto unlock_out; - } - if (!list_empty(&prange->child_list)) { - pr_debug("range split by unmap in parallel, validate again\n"); - r = -EAGAIN; - goto unlock_out; - } + next = min(vma->vm_end, end); + npages = (next - addr) >> PAGE_SHIFT; + r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, + addr, npages, &hmm_range, + readonly, true, owner); + if (r) { + pr_debug("failed %d to get svm range pages\n", r); + goto unreserve_out; + } - r = svm_range_map_to_gpus(prange, ctx.bitmap, wait); + offset = (addr - start) >> PAGE_SHIFT; + r = svm_range_dma_map(prange, ctx.bitmap, offset, npages, + hmm_range->hmm_pfns); + if (r) { + pr_debug("failed %d to dma map range\n", r); + goto unreserve_out; + } + + svm_range_lock(prange); + if (amdgpu_hmm_range_get_pages_done(hmm_range)) { + pr_debug("hmm update the range, need validate again\n"); + r = -EAGAIN; + goto unlock_out; + } + if (!list_empty(&prange->child_list)) { + pr_debug("range split by unmap in parallel, validate again\n"); + r = -EAGAIN; + goto unlock_out; + } + + r = svm_range_map_to_gpus(prange, offset, npages, readonly, + ctx.bitmap, wait); unlock_out: - svm_range_unlock(prange); + svm_range_unlock(prange); + + addr = next; + } + + if (addr == end) + prange->validated_once = true; + unreserve_out: svm_range_unreserve_bos(&ctx); @@ -2400,9 +2438,29 @@ svm_range_count_fault(struct amdgpu_device *adev, struct kfd_process *p, WRITE_ONCE(pdd->faults, pdd->faults + 1); } +static bool +svm_fault_allowed(struct mm_struct *mm, uint64_t addr, bool write_fault) +{ + unsigned long requested = VM_READ; + struct vm_area_struct *vma; + + if (write_fault) + requested |= VM_WRITE; + + vma = find_vma(mm, addr << PAGE_SHIFT); + if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) { + pr_debug("address 0x%llx VMA is removed\n", addr); + return true; + } + + pr_debug("requested 0x%lx, vma permission flags 0x%lx\n", requested, + vma->vm_flags); + return (vma->vm_flags & requested) == requested; +} + int svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid, - uint64_t addr) + uint64_t addr, bool write_fault) { struct mm_struct *mm = NULL; struct svm_range_list *svms; @@ -2484,6 +2542,13 @@ retry_write_locked: goto out_unlock_range; } + if (!svm_fault_allowed(mm, addr, write_fault)) { + pr_debug("fault addr 0x%llx no %s permission\n", addr, + write_fault ? "write" : "read"); + r = -EPERM; + goto out_unlock_range; + } + best_loc = svm_range_best_restore_location(prange, adev, &gpuidx); if (best_loc == -1) { pr_debug("svms %p failed get best restore loc [0x%lx 0x%lx]\n", @@ -2675,22 +2740,26 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size, return 0; } -/* svm_range_best_prefetch_location - decide the best prefetch location +/** + * svm_range_best_prefetch_location - decide the best prefetch location * @prange: svm range structure * * For xnack off: - * If range map to single GPU, the best acutal location is prefetch loc, which + * If range map to single GPU, the best prefetch location is prefetch_loc, which * can be CPU or GPU. * - * If range map to multiple GPUs, only if mGPU connection on xgmi same hive, - * the best actual location could be prefetch_loc GPU. If mGPU connection on - * PCIe, the best actual location is always CPU, because GPU cannot access vram - * of other GPUs, assuming PCIe small bar (large bar support is not upstream). + * If range is ACCESS or ACCESS_IN_PLACE by mGPUs, only if mGPU connection on + * XGMI same hive, the best prefetch location is prefetch_loc GPU, othervise + * the best prefetch location is always CPU, because GPU can not have coherent + * mapping VRAM of other GPUs even with large-BAR PCIe connection. * * For xnack on: - * The best actual location is prefetch location. If mGPU connection on xgmi - * same hive, range map to multiple GPUs. Otherwise, the range only map to - * actual location GPU. Other GPU access vm fault will trigger migration. + * If range is not ACCESS_IN_PLACE by mGPUs, the best prefetch location is + * prefetch_loc, other GPU access will generate vm fault and trigger migration. + * + * If range is ACCESS_IN_PLACE by mGPUs, only if mGPU connection on XGMI same + * hive, the best prefetch location is prefetch_loc GPU, otherwise the best + * prefetch location is always CPU. * * Context: Process context * @@ -2710,11 +2779,6 @@ svm_range_best_prefetch_location(struct svm_range *prange) p = container_of(prange->svms, struct kfd_process, svms); - /* xnack on */ - if (p->xnack_enabled) - goto out; - - /* xnack off */ if (!best_loc || best_loc == KFD_IOCTL_SVM_LOCATION_UNDEFINED) goto out; @@ -2724,8 +2788,12 @@ svm_range_best_prefetch_location(struct svm_range *prange) best_loc = 0; goto out; } - bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip, - MAX_GPU_INSTANCE); + + if (p->xnack_enabled) + bitmap_copy(bitmap, prange->bitmap_aip, MAX_GPU_INSTANCE); + else + bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip, + MAX_GPU_INSTANCE); for_each_set_bit(gpuidx, bitmap, MAX_GPU_INSTANCE) { pdd = kfd_process_device_from_gpuidx(p, gpuidx); @@ -3019,7 +3087,8 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size, struct svm_range *prange; uint32_t prefetch_loc = KFD_IOCTL_SVM_LOCATION_UNDEFINED; uint32_t location = KFD_IOCTL_SVM_LOCATION_UNDEFINED; - uint32_t flags = 0xffffffff; + uint32_t flags_and = 0xffffffff; + uint32_t flags_or = 0; int gpuidx; uint32_t i; @@ -3054,12 +3123,12 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size, get_accessible = true; break; case KFD_IOCTL_SVM_ATTR_SET_FLAGS: + case KFD_IOCTL_SVM_ATTR_CLR_FLAGS: get_flags = true; break; case KFD_IOCTL_SVM_ATTR_GRANULARITY: get_granularity = true; break; - case KFD_IOCTL_SVM_ATTR_CLR_FLAGS: case KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE: case KFD_IOCTL_SVM_ATTR_NO_ACCESS: fallthrough; @@ -3077,7 +3146,8 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size, if (!node) { pr_debug("range attrs not found return default values\n"); svm_range_set_default_attributes(&location, &prefetch_loc, - &granularity, &flags); + &granularity, &flags_and); + flags_or = flags_and; if (p->xnack_enabled) bitmap_copy(bitmap_access, svms->bitmap_supported, MAX_GPU_INSTANCE); @@ -3123,8 +3193,10 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size, bitmap_and(bitmap_aip, bitmap_aip, prange->bitmap_aip, MAX_GPU_INSTANCE); } - if (get_flags) - flags &= prange->flags; + if (get_flags) { + flags_and &= prange->flags; + flags_or |= prange->flags; + } if (get_granularity && prange->granularity < granularity) granularity = prange->granularity; @@ -3158,7 +3230,10 @@ fill_values: attrs[i].type = KFD_IOCTL_SVM_ATTR_NO_ACCESS; break; case KFD_IOCTL_SVM_ATTR_SET_FLAGS: - attrs[i].value = flags; + attrs[i].value = flags_and; + break; + case KFD_IOCTL_SVM_ATTR_CLR_FLAGS: + attrs[i].value = ~flags_or; break; case KFD_IOCTL_SVM_ATTR_GRANULARITY: attrs[i].value = (uint32_t)granularity; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 3fc1fd8b4fbc..c6ec55354c7b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -175,7 +175,7 @@ int svm_range_split_by_granularity(struct kfd_process *p, struct mm_struct *mm, unsigned long addr, struct svm_range *parent, struct svm_range *prange); int svm_range_restore_pages(struct amdgpu_device *adev, - unsigned int pasid, uint64_t addr); + unsigned int pasid, uint64_t addr, bool write_fault); int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence); void svm_range_add_list_work(struct svm_range_list *svms, struct svm_range *prange, struct mm_struct *mm, @@ -209,7 +209,8 @@ static inline void svm_range_list_fini(struct kfd_process *p) } static inline int svm_range_restore_pages(struct amdgpu_device *adev, - unsigned int pasid, uint64_t addr) + unsigned int pasid, uint64_t addr, + bool write_fault) { return -EFAULT; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index b1ce072aa20b..98cca5f2b27f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -478,6 +478,8 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr, dev->node_props.simd_per_cu); sysfs_show_32bit_prop(buffer, offs, "max_slots_scratch_cu", dev->node_props.max_slots_scratch_cu); + sysfs_show_32bit_prop(buffer, offs, "gfx_target_version", + dev->node_props.gfx_target_version); sysfs_show_32bit_prop(buffer, offs, "vendor_id", dev->node_props.vendor_id); sysfs_show_32bit_prop(buffer, offs, "device_id", @@ -1360,6 +1362,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu) dev->node_props.simd_arrays_per_engine = cu_info.num_shader_arrays_per_engine; + dev->node_props.gfx_target_version = gpu->device_info->gfx_target_version; dev->node_props.vendor_id = gpu->pdev->vendor; dev->node_props.device_id = gpu->pdev->device; dev->node_props.capability |= @@ -1424,6 +1427,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu) case CHIP_DIMGREY_CAVEFISH: case CHIP_BEIGE_GOBY: case CHIP_YELLOW_CARP: + case CHIP_CYAN_SKILLFISH: dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 << HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) & HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK); @@ -1630,7 +1634,7 @@ int kfd_debugfs_rls_by_device(struct seq_file *m, void *data) } seq_printf(m, "Node %u, gpu_id %x:\n", i++, dev->gpu->id); - r = pm_debugfs_runlist(m, &dev->gpu->dqm->packets); + r = pm_debugfs_runlist(m, &dev->gpu->dqm->packet_mgr); if (r) break; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h index 8b48c6692007..a8db017c9b8e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h @@ -78,6 +78,7 @@ struct kfd_node_properties { uint32_t simd_per_cu; uint32_t max_slots_scratch_cu; uint32_t engine_id; + uint32_t gfx_target_version; uint32_t vendor_id; uint32_t device_id; uint32_t location_id; |