summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-21selftest/vm: support xfail in mremap_testSidhartha Kumar
Use ksft_test_result_xfail for the tests which are expected to fail. Link: https://lkml.kernel.org/r/20220420215721.4868-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21selftest/vm: verify remap destination address in mremap_testSidhartha Kumar
Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy existing mappings. This causes a segfault when regions such as text are remapped and the permissions are changed. Verify the requested mremap destination address does not overlap any existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep incrementing the destination address until a valid mapping is found or fail the current test once the max address is reached. Link: https://lkml.kernel.org/r/20220420215721.4868-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21selftest/vm: verify mmap addr in mremap_testSidhartha Kumar
Avoid calling mmap with requested addresses that are less than the system's mmap_min_addr. When run as root, mmap returns EACCES when trying to map addresses < mmap_min_addr. This is not one of the error codes for the condition to retry the mmap in the test. Rather than arbitrarily retrying on EACCES, don't attempt an mmap until addr > vm.mmap_min_addr. Add a munmap call after an alignment check as the mappings are retained after the retry and can reach the vm.max_map_count sysctl. Link: https://lkml.kernel.org/r/20220420215721.4868-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21mm, hugetlb: allow for "high" userspace addressesChristophe Leroy
This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit f6795053dac8 was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit f6795053dac8. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit f6795053dac8 (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21userfaultfd: mark uffd_wp regardless of VM_WRITE flagNadav Amit
When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is currently only marked as write-protected if the VMA has VM_WRITE flag set. This seems incorrect or at least would be unexpected by the users. Consider the following sequence of operations that are being performed on a certain page: mprotect(PROT_READ) UFFDIO_COPY(UFFDIO_COPY_MODE_WP) mprotect(PROT_READ|PROT_WRITE) At this point the user would expect to still get UFFD notification when the page is accessed for write, but the user would not get one, since the PTE was not marked as UFFD_WP during UFFDIO_COPY. Fix it by always marking PTEs as UFFD_WP regardless on the write-permission in the VMA flags. Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21memcg: sync flush only if periodic flush is delayedShakeel Butt
Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Daniel Dao <dqminh@cloudflare.com> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Frank Hofmann <fhofmann@cloudflare.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21mm/memory-failure.c: skip huge_zero_page in memory_failure()Xu Yu
Kernel panic when injecting memory_failure for the global huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows. Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2499! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 RIP: 0010:split_huge_page_to_list+0x66a/0x880 Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: try_to_split_thp_page+0x3a/0x130 memory_failure+0x128/0x800 madvise_inject_error.cold+0x8b/0xa1 __x64_sys_madvise+0x54/0x60 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc3754f8bf9 Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 This makes huge_zero_page bail out explicitly before split in memory_failure(), thus the panic above won't happen again. Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7") Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()Naoya Horiguchi
There is a race condition between memory_failure_hugetlb() and hugetlb free/demotion, which causes setting PageHWPoison flag on the wrong page. The one simple result is that wrong processes can be killed, but another (more serious) one is that the actual error is left unhandled, so no one prevents later access to it, and that might lead to more serious results like consuming corrupted data. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... The current code first does prechecks roughly and then reconfirms after taking refcount, but it's found that it makes code overly complicated, so move the prechecks in a single hugetlb_lock range. A newly introduced function, try_memory_failure_hugetlb(), always takes hugetlb_lock (even for non-hugetlb pages). That can be improved, but memory_failure() is rare in principle, so should not be a big problem. Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev Fixes: 761ad8d7c7b5 ("mm: hwpoison: introduce memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21clk: qcom: clk-rcg2: fix gfx3d frequency calculationDmitry Baryshkov
Since the commit 948fb0969eae ("clk: Always clamp the rounded rate"), the clk_core_determine_round_nolock() would clamp the requested rate between min and max rates from the rate request. Normally these fields would be filled by clk_core_get_boundaries() called from clk_round_rate(). However clk_gfx3d_determine_rate() uses a manually crafted rate request, which did not have these fields filled. Thus the requested frequency would be clamped to 0, resulting in weird frequencies being requested from the hardware. Fix this by filling min_rate and max_rate to the values valid for the respective PLLs (0 and ULONG_MAX). Fixes: 948fb0969eae ("clk: Always clamp the rounded rate") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220419235447.1586192-1-dmitry.baryshkov@linaro.org Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reported-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-04-21clk: microchip: mpfs: don't reset disabled peripheralsConor Dooley
The current clock driver for PolarFire SoC puts the hardware behind "periph" clocks into reset if their clock is disabled. CONFIG_PM was recently added to the riscv defconfig and exposed issues caused by this behaviour, where the Cadence GEM was being put into reset between its bringup & the PHY bringup: https://lore.kernel.org/linux-riscv/9f4b057d-1985-5fd3-65c0-f944161c7792@microchip.com/ Fix this (for now) by removing the reset from mpfs_periph_clk_disable. Fixes: 635e5e73370e ("clk: microchip: Add driver for Microchip PolarFire SoC") Reviewed-by: Daire McNamara <daire.mcnamara@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20220411072340.740981-1-conor.dooley@microchip.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-04-21f2fs: should not truncate blocks during roll-forward recoveryJaegeuk Kim
If the file preallocated blocks and fsync'ed, we should not truncate them during roll-forward recovery which will recover i_size correctly back. Fixes: d4dd19ec1ea0 ("f2fs: do not expose unwritten blocks to user by DIO") Cc: <stable@vger.kernel.org> # 5.17+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-22Merge tag 'drm-misc-next-2022-04-21' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.19-rc1 UAPI Changes: Cross-subsystem Changes: - of: Create a platform_device for offb Core Changes: - edid: block read refactoring - ttm: Add common debugfs code for resource managers Driver Changes: - bridges: - adv7611: Enable DRM_BRIDGE_OP_HPD if there's an interrupt - anx7625: Fill ELD if no monitor is connected - dw_hdmi: Add General Parallel Audio support - icn6211: Add data-lanes DT property - new driver: Lontium LT9211 - nouveau: make some structures static - tidss: Reset DISPC on startup - solomon: SPI Support and DT bindings improvements Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220421065948.2pyp3j7acxtl6pz5@houat
2022-04-22ata: pata_marvell: Check the 'bmdma_addr' beforing readingZheyu Ma
Before detecting the cable type on the dma bar, the driver should check whether the 'bmdma_addr' is zero, which means the adapter does not support DMA, otherwise we will get the following error: [ 5.146634] Bad IO access at port 0x1 (return inb(port)) [ 5.147206] WARNING: CPU: 2 PID: 303 at lib/iomap.c:44 ioread8+0x4a/0x60 [ 5.150856] RIP: 0010:ioread8+0x4a/0x60 [ 5.160238] Call Trace: [ 5.160470] <TASK> [ 5.160674] marvell_cable_detect+0x6e/0xc0 [pata_marvell] [ 5.161728] ata_eh_recover+0x3520/0x6cc0 [ 5.168075] ata_do_eh+0x49/0x3c0 Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-04-22Merge tag 'drm-msm-fixes-2022-04-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes Revert to fix iommu regression. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtvPo4xD2peAztDMPP2n4utb7d9WQboMFwsba9E8U2rCw@mail.gmail.com
2022-04-21Merge tag 'dmaengine-fix-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "A bunch of driver fixes: - idxd device RO checks and device cleanup - dw-edma unaligned access and alignment - qcom: missing minItems in binding - mediatek pm usage fix - imx init script" * tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dt-bindings: dmaengine: qcom: gpi: Add minItems for interrupts dmaengine: idxd: skip clearing device context when device is read-only dmaengine: idxd: add RO check for wq max_transfer_size write dmaengine: idxd: add RO check for wq max_batch_size write dmaengine: idxd: fix retry value to be constant for duration of function call dmaengine: idxd: match type for retries var in idxd_enqcmds() dmaengine: dw-edma: Fix inconsistent indenting dmaengine: dw-edma: Fix unaligned 64bit access dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources dmaengine: imx-sdma: Fix error checking in sdma_event_remap dma: at_xdmac: fix a missing check on list iterator dmaengine: imx-sdma: fix init of uart scripts dmaengine: idxd: fix device cleanup on disable
2022-04-22drm/mediatek: dpi: Use mt8183 output formats for mt8192Nícolas F. R. A. Prado
The configuration for mt8192 was incorrectly using the output formats from mt8173. Since the output formats for mt8192 are instead the same ones as for mt8183, which require two bus samples per pixel, the pixelclock and DDR edge setting were misconfigured. This made external displays unable to show the image. Fix the issue by correcting the output format for mt8192 to be the same as for mt8183, fixing the usage of external displays for mt8192. Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20220408013950.674477-1-nfraprado@collabora.com/ Fixes: be63f6e8601f ("drm/mediatek: dpi: Add output bus formats to driver data") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com> Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2022-04-22drm/mediatek: Add display support for MT8186Yongqiang Niu
Add mmsys driver data and compatible for MT8186 in mtk_drm_drv.c. Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20220421051218.8652-2-rex-bc.chen@mediatek.com/ Signed-off-by: Yongqiang Niu <yongqiang.niu@mediatek.com> Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2022-04-21RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLERandy Dunlap
There can be lots of build errors when building cpuidle-riscv-sbi.o. They are all caused by a kconfig problem with this warning: WARNING: unmet direct dependencies detected for RISCV_SBI_CPUIDLE Depends on [n]: CPU_IDLE [=y] && RISCV [=y] && RISCV_SBI [=n] Selected by [y]: - SOC_VIRT [=y] && CPU_IDLE [=y] so make the 'select' of RISCV_SBI_CPUIDLE also depend on RISCV_SBI. Fixes: c5179ef1ca0c ("RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Anup Patel <anup@brainfault.org> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-21RISC-V: mm: Fix set_satp_mode() for platform not having Sv57Anup Patel
When Sv57 is not available the satp.MODE test in set_satp_mode() will fail and lead to pgdir re-programming for Sv48. The pgdir re-programming will fail as well due to pre-existing pgdir entry used for Sv57 and as a result kernel fails to boot on RISC-V platform not having Sv57. To fix above issue, we should clear the pgdir memory in set_satp_mode() before re-programming. Fixes: 011f09d12052 ("riscv: mm: Set sv57 on defaultly") Reported-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-21drm/msm: return the average load over the polling periodChia-I Wu
simple_ondemand interacts poorly with clamp_to_idle. It only looks at the load since the last get_dev_status call, while it should really look at the load over polling_ms. When clamp_to_idle true, it almost always picks the lowest frequency on active because the gpu is idle between msm_devfreq_idle/msm_devfreq_active. This logic could potentially be moved into devfreq core. Fixes: 7c0ffcd40b16 ("drm/msm/gpu: Respect PM QoS constraints") Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220416003314.59211-3-olvaffe@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: simplify gpu_busy callbackChia-I Wu
Move tracking and busy time calculation to msm_devfreq_get_dev_status. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220416003314.59211-2-olvaffe@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: remove explicit devfreq status resetChia-I Wu
It is redundant since commit 7c0ffcd40b16 ("drm/msm/gpu: Respect PM QoS constraints") because dev_pm_qos_update_request triggers get_dev_status. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220416003314.59211-1-olvaffe@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: Add a way for userspace to allocate GPU iovaRob Clark
The motivation at this point is mainly native userspace mesa driver in a VM guest. The one remaining synchronous "hotpath" is buffer allocation, because guest needs to wait to know the bo's iova before it can start emitting cmdstream/state that references the new bo. By allocating the iova in the guest userspace, we no longer need to wait for a response from the host, but can just rely on the allocation request being processed before the cmdstream submission. Allocation failures (OoM, etc) would just be treated as context-lost (ie. GL_GUILTY_CONTEXT_RESET) or subsequent allocations (or readpix, etc) can raise GL_OUT_OF_MEMORY. v2: Fix inuse check v3: Change mismatched iova case to -EBUSY Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://lore.kernel.org/r/20220411215849.297838-11-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Add fenced vma unpinRob Clark
With userspace allocated iova (next patch), we can have a race condition where userspace observes the fence completion and deletes the vma before retire_submit() gets around to unpinning the vma. To handle this, add a fenced unpin which drops the refcount but tracks the fence, and update msm_gem_vma_inuse() to check any previously unsignaled fences. v2: Fix inuse underflow (duplicate unpin) v3: Fix msm_job_run() vs submit_cleanup() race condition Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220411215849.297838-10-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Split vma lookup and pinRob Clark
This way we only lookup vma once per object per submit, for both the submit and retire path. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220411215849.297838-9-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Rework vma lookup and pinRob Clark
Combines duplicate vma lookup in the get_and_pin path. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://lore.kernel.org/r/20220411215849.297838-8-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: Drop msm_gem_iova()Rob Clark
There was only a single user, which could just as easily stash the iova when pinning. v2: fix prepare->prepare->cleanup->cleanup sequences Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220411215849.297838-7-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Drop PAGE_SHIFT for address space mmRob Clark
Get rid of all the unnecessary conversion between address/size and page offsets. It just confuses things. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220411215849.297838-6-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Split out inuse helperRob Clark
Prep for a following patch, where it gets a bit more complicated. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220411215849.297838-5-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Convert some missed GEM_WARN_ON()sRob Clark
Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220411215849.297838-4-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gpu: Drop duplicate fence counterRob Clark
The ring seqno counter duplicates the fence-context last_fence counter. They end up getting incremented in lock-step, on the same scheduler thread, but the split just makes things less obvious. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220411215849.297838-3-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm/gem: Move prototypesRob Clark
These belong more cleanly in the gem header. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220411215849.297838-2-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: Add a way to override processes comm/cmdlineRob Clark
In the cause of using the GPU via virtgpu, the host side process is really a sort of proxy, and not terribly interesting from the PoV of crash/fault logging. Add a way to override these per process so that we can see the guest process's name. v2: Handle kmalloc failure, add comment to explain kstrdup returns NULL if passed NULL [Dan Carpenter] Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220317165144.222101-4-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: Split out helper to get comm/cmdlineRob Clark
Deduplicate this from fault_worker and recover_worker. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220317165144.222101-3-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: Add support for pointer paramsRob Clark
The 64b value field is already suffient to hold a pointer instead of immediate, but we also need a length field. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220317165144.222101-2-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/msm: Remove unused field in submitRob Clark
Noticed this was unused and never set. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220225202614.225197-2-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21drm/fourcc: Add QCOM tiled modifiersRob Clark
These are mainly used internally in mesa, although I believe the display should be able to scan out the TILED3 format. Currently we define this modifier internally in mesa for use with modifier based allocation. But we can get rid of that hack if we define the modfiers properly. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20210904170603.1739137-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-22Merge tag 'drm-intel-fixes-2022-04-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Unset enable_psr2_sel_fetch if PSR2 detection fails - Fix to detect when VRR is turned off from panel settings Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YmAKuHwon7hGyIoC@jlahtine-mobl.ger.corp.intel.com
2022-04-21drm/amd/amdgpu: Update PF2VF headerBokun Zhang
- In the latest version of the header, there is a variable name change. This should not cause any backward compatibility since the variable is at the same offset in the struct. Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amd/amdgpu: Properly indent PF2VF headerBokun Zhang
- Clean up the identation in the header file Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amd/amdgpu: Update MIT license in SRIOV msg headerBokun Zhang
- Update MIT license header Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amdgpu/display: make hubp31_program_extended_blank staticAlex Deucher
It's not used outside of dcn31_hubp.c. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amd/display: Fix memory leak in dcn21_clock_source_createMiaoqian Lin
When dcn20_clk_src_construct() fails, we need to release clk_src. Fixes: 6f4e6361c3ff ("drm/amd/display: Add Renoir resource (v2)") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amd/display: Remove useless codeHaowen Bai
aux_rep only memset but no use at all, so we drop it. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amdgpu: don't runtime suspend if there are displays attached (v3)Alex Deucher
We normally runtime suspend when there are displays attached if they are in the DPMS off state, however, if something wakes the GPU we send a hotplug event on resume (in case any displays were connected while the GPU was in suspend) which can cause userspace to light up the displays again soon after they were turned off. Prior to commit 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."), the driver took a runtime pm reference when the fbdev emulation was enabled because we didn't implement proper shadowing support for vram access when the device was off so the device never runtime suspended when there was a console bound. Once that commit landed, we now utilize the core fb helper implementation which properly handles the emulation, so runtime pm now suspends in cases where it did not before. Ultimately, we need to sort out why runtime suspend in not working in this case for some users, but this should restore similar behavior to before. v2: move check into runtime_suspend v3: wake ups -> wakeups in comment, retain pm_runtime behavior in runtime_idle callback Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/ Tested-by: Michele Ballabio <ballabio.m@gmail.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21Revert "drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too"Lang Yu
This reverts commit 36bf93216ecbe399c40c5e0486f0f0e3a4afa69e. It causes SVM regressions on Vega10 with XNACK-ON. Just revert it at the moment. ./kfdtest --gtest_filter=KFDSVMRangeTest.MigratePolicyTest Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Philip Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amdgpu: Add debugfs TA load/unload/invoke supportCandice Li
v1: Add debugfs support to load/unload/invoke TA in runtime. v2: 1. Update some variables to static. 2. Use PAGE_ALIGN to calculate shared buf size directly. 3. Remove fp check. 4. Update debugfs from read to write. Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amdgpu: Use indirect buffer and save response status for TA load/invokeCandice Li
The upcoming TA debugfs interface needs to use indirect buffer when performing TA invoke and check psp response status for TA load and invoke. Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21kvm: selftests: introduce and use more page size-related constantsPaolo Bonzini
Clean up code that was hardcoding masks for various fields, now that the masks are included in processor.h. For more cleanup, define PAGE_SIZE and PAGE_MASK just like in Linux. PAGE_SIZE in particular was defined by several tests. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21kvm: selftests: do not use bitfields larger than 32-bits for PTEsPaolo Bonzini
Red Hat's QE team reported test failure on access_tracking_perf_test: Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages guest physical test memory offset: 0x3fffbffff000 Populating memory : 0.684014577s Writing to populated memory : 0.006230175s Reading from populated memory : 0.004557805s ==== Test Assertion Failure ==== lib/kvm_util.c:1411: false pid=125806 tid=125809 errno=4 - Interrupted system call 1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411 2 (inlined by) addr_gpa2hva at kvm_util.c:1405 3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98 4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152 5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232 6 0x00007fefe9ff81ce: ?? ??:0 7 0x00007fefe9c64d82: ?? ??:0 No vm physical memory at 0xffbffff000 I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46 bits PA. It turns out that the address translation for clearing idle page tracking returned a wrong result; addr_gva2gpa()'s last step, which is based on "pte[index[0]].pfn", did the calculation with 40 bits length and the high 12 bits got truncated. In above case the GPA address to be returned should be 0x3fffbffff000 for GVA 0xc0000000, but it got truncated into 0xffbffff000 and the subsequent gpa2hva lookup failed. The width of operations on bit fields greater than 32-bit is implementation defined, and differs between GCC (which uses the bitfield precision) and clang (which uses 64-bit arithmetic), so this is a potential minefield. Remove the bit fields and using manual masking instead. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036 Reported-by: Nana Liu <nanliu@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>