From be957c886d92aa9caf0f63aee2c77d1497217d93 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 1 May 2020 15:20:45 -0300 Subject: mm/hmm: make hmm_range_fault return 0 or -1 hmm_vma_walk->last is supposed to be updated after every write to the pfns, so that it can be returned by hmm_range_fault(). However, this is not done consistently. Fortunately nothing checks the return code of hmm_range_fault() for anything other than error. More importantly last must be set before returning -EBUSY as it is used to prevent reading an output pfn as an input flags when the loop restarts. For clarity and simplicity make hmm_range_fault() return 0 or -ERRNO. Only set last when returning -EBUSY. Link: https://lore.kernel.org/r/2-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com Acked-by: Felix Kuehling Tested-by: Ralph Campbell Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe --- Documentation/vm/hmm.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/vm') diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 4e3e9362afeb..9924f2caa018 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -161,7 +161,7 @@ device must complete the update before the driver callback returns. When the device driver wants to populate a range of virtual addresses, it can use:: - long hmm_range_fault(struct hmm_range *range); + int hmm_range_fault(struct hmm_range *range); It will trigger a page fault on missing or read-only entries if write access is requested (see below). Page faults use the generic mm page fault code path just -- cgit From 2733ea144dcce789de20988c1056e228a07b1bff Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 1 May 2020 15:20:48 -0300 Subject: mm/hmm: remove the customizable pfn format from hmm_range_fault Presumably the intent here was that hmm_range_fault() could put the data into some HW specific format and thus avoid some work. However, nothing actually does that, and it isn't clear how anything actually could do that as hmm_range_fault() provides CPU addresses which must be DMA mapped. Perhaps there is some special HW that does not need DMA mapping, but we don't have any examples of this, and the theoretical performance win of avoiding an extra scan over the pfns array doesn't seem worth the complexity. Plus pfns needs to be scanned anyhow to sort out any DEVICE_PRIVATE pages. This version replaces the uint64_t with an usigned long containing a pfn and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values, on successful output it is filled with HMM_PFN_* values, describing the state of the pages. amdgpu is simple to convert, it doesn't use snapshot and doesn't use per-page flags. nouveau uses only 16 hmm_pte entries at most (ie fits in a few cache lines), and it sweeps over its pfns array a couple of times anyhow. It also has a nasty call chain before it reaches the dma map and hardware suggesting performance isn't important: nouveau_svm_fault(): args.i.m.method = NVIF_VMM_V0_PFNMAP nouveau_range_fault() nvif_object_ioctl() client->driver->ioctl() struct nvif_driver nvif_driver_nvkm: .ioctl = nvkm_client_ioctl nvkm_ioctl() nvkm_ioctl_path() nvkm_ioctl_v0[type].func(..) nvkm_ioctl_mthd() nvkm_object_mthd() struct nvkm_object_func nvkm_uvmm: .mthd = nvkm_uvmm_mthd nvkm_uvmm_mthd() nvkm_uvmm_mthd_pfnmap() nvkm_vmm_pfn_map() nvkm_vmm_ptes_get_map() func == gp100_vmm_pgt_pfn struct nvkm_vmm_desc_func gp100_vmm_desc_spt: .pfn = gp100_vmm_pgt_pfn nvkm_vmm_iter() REF_PTES == func == gp100_vmm_pgt_pfn() dma_map_page() Link: https://lore.kernel.org/r/5-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com Acked-by: Felix Kuehling Tested-by: Ralph Campbell Signed-off-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe --- Documentation/vm/hmm.rst | 28 ++++++++++------------------ 1 file changed, 10 insertions(+), 18 deletions(-) (limited to 'Documentation/vm') diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 9924f2caa018..561969754bc0 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -184,10 +184,7 @@ The usage pattern is:: range.notifier = &interval_sub; range.start = ...; range.end = ...; - range.pfns = ...; - range.flags = ...; - range.values = ...; - range.pfn_shift = ...; + range.hmm_pfns = ...; if (!mmget_not_zero(interval_sub->notifier.mm)) return -EFAULT; @@ -229,15 +226,10 @@ The hmm_range struct has 2 fields, default_flags and pfn_flags_mask, that specif fault or snapshot policy for the whole range instead of having to set them for each entry in the pfns array. -For instance, if the device flags for range.flags are:: +For instance if the device driver wants pages for a range with at least read +permission, it sets:: - range.flags[HMM_PFN_VALID] = (1 << 63); - range.flags[HMM_PFN_WRITE] = (1 << 62); - -and the device driver wants pages for a range with at least read permission, -it sets:: - - range->default_flags = (1 << 63); + range->default_flags = HMM_PFN_REQ_FAULT; range->pfn_flags_mask = 0; and calls hmm_range_fault() as described above. This will fill fault all pages @@ -246,18 +238,18 @@ in the range with at least read permission. Now let's say the driver wants to do the same except for one page in the range for which it wants to have write permission. Now driver set:: - range->default_flags = (1 << 63); - range->pfn_flags_mask = (1 << 62); - range->pfns[index_of_write] = (1 << 62); + range->default_flags = HMM_PFN_REQ_FAULT; + range->pfn_flags_mask = HMM_PFN_REQ_WRITE; + range->pfns[index_of_write] = HMM_PFN_REQ_WRITE; With this, HMM will fault in all pages with at least read (i.e., valid) and for the address == range->start + (index_of_write << PAGE_SHIFT) it will fault with write permission i.e., if the CPU pte does not have write permission set then HMM will call handle_mm_fault(). -Note that HMM will populate the pfns array with write permission for any page -that is mapped with CPU write permission no matter what values are set -in default_flags or pfn_flags_mask. +After hmm_range_fault completes the flag bits are set to the current state of +the page tables, ie HMM_PFN_VALID | HMM_PFN_WRITE will be set if the page is +writable. Represent and manage device memory from core kernel point of view -- cgit