summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-18mm: vmscan: move set_task_reclaim_state() near flush_reclaim_state()Yosry Ahmed
Move set_task_reclaim_state() near flush_reclaim_state() so that all helpers manipulating reclaim_state are in close proximity. Link: https://lkml.kernel.org/r/20230413104034.1086717-3-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: ignore non-LRU-based reclaim in memcg reclaimYosry Ahmed
Patch series "Ignore non-LRU-based reclaim in memcg reclaim", v6. Upon running some proactive reclaim tests using memory.reclaim, we noticed some tests flaking where writing to memory.reclaim would be successful even though we did not reclaim the requested amount fully Looking further into it, I discovered that *sometimes* we overestimate the number of reclaimed pages in memcg reclaim. Reclaimed pages through other means than LRU-based reclaim are tracked through reclaim_state in struct scan_control, which is stashed in current task_struct. These pages are added to the number of reclaimed pages through LRUs. For memcg reclaim, these pages generally cannot be linked to the memcg under reclaim and can cause an overestimated count of reclaimed pages. This short series tries to address that. Patch 1 ignores pages reclaimed outside of LRU reclaim in memcg reclaim. The pages are uncharged anyway, so even if we end up under-reporting reclaimed pages we will still succeed in making progress during charging. Patches 2-3 are just refactoring. Patch 2 moves set_reclaim_state() helper next to flush_reclaim_state(). Patch 3 adds a helper that wraps updating current->reclaim_state, and renames reclaim_state->reclaimed_slab to reclaim_state->reclaimed. This patch (of 3): We keep track of different types of reclaimed pages through reclaim_state->reclaimed_slab, and we add them to the reported number of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg reclaim, we have no clue if those pages are charged to the memcg under reclaim. Slab pages are shared by different memcgs, so a freed slab page may have only been partially charged to the memcg under reclaim. The same goes for clean file pages from pruned inodes (on highmem systems) or xfs buffer pages, there is no simple way to currently link them to the memcg under reclaim. Stop reporting those freed pages as reclaimed pages during memcg reclaim. This should make the return value of writing to memory.reclaim, and may help reduce unnecessary reclaim retries during memcg charging. Writing to memory.reclaim on the root memcg is considered as cgroup_reclaim(), but for this case we want to include any freed pages, so use the global_reclaim() check instead of !cgroup_reclaim(). Generally, this should make the return value of try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g. freed a slab page that was mostly charged to the memcg under reclaim), the return value of try_to_free_mem_cgroup_pages() can be underestimated, but this should be fine. The freed pages will be uncharged anyway, and we can charge the memcg the next time around as we usually do memcg reclaim in a retry loop. Link: https://lkml.kernel.org/r/20230413104034.1086717-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230413104034.1086717-2-yosryahmed@google.com Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: apply __must_check to vmap_pages_range_noflush()Alexander Potapenko
To prevent errors when vmap_pages_range_noflush() or __vmap_pages_range_noflush() silently fail (see the link below for an example), annotate them with __must_check so that the callers do not unconditionally assume the mapping succeeded. Link: https://lkml.kernel.org/r/20230413131223.4135168-4-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: kmsan: apply __must_check to non-void functionsAlexander Potapenko
Non-void KMSAN hooks may return error codes that indicate that KMSAN failed to reflect the changed memory state in the metadata (e.g. it could not create the necessary memory mappings). In such cases the callers should handle the errors to prevent the tool from using the inconsistent metadata in the future. We mark non-void hooks with __must_check so that error handling is not skipped. Link: https://lkml.kernel.org/r/20230413131223.4135168-3-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dipanjan Das <mail.dipanjan.das@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: hwpoison: support recovery from HugePage copy-on-write faultsLiu Shixin
copy-on-write of hugetlb user pages with uncorrectable errors will result in a kernel crash. This is because the copy is performed in kernel mode and in general we can not handle accessing memory with such errors while in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") introduced the routine copy_user_highpage_mc() to gracefully handle copying of user pages with uncorrectable errors. However, the separate hugetlb copy-on-write code paths were not modified as part of commit a873dfe1032a. Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so that they can also gracefully handle uncorrectable errors in user pages. This involves changing the hugetlb specific routine copy_user_large_folio() from type void to int so that it can return an error. Modify the hugetlb userfaultfd code in the same way so that it can return -EHWPOISON if it encounters an uncorrectable error. Link: https://lkml.kernel.org/r/20230413131349.2524210-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18memcg: page_cgroup_ino() get memcg from the page's folioYosry Ahmed
In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we observed a warning from page_cgroup_ino() when reading /proc/kpagecgroup. This warning was added to catch fragile reads of a page memcg. Make page_cgroup_ino() get memcg from the page's folio using folio_memcg_check(): that gives it the correct memcg for each page of a folio, so is the right fix. Note that page_folio() is racy, the page's folio can change from under us, but the entire function is racy and documented as such. I dithered between the right fix and the safer "fix": it's unlikely but conceivable that some userspace has learnt that /proc/kpagecgroup gives no memcg on tail pages, and compensates for that in some (racy) way: so continuing to give no memcg on tails, without warning, might be safer. But hwpoison_filter_task(), the only other user of page_cgroup_ino(), persuaded me. It looks as if it currently leaves out tail pages of the selected memcg, by mistake: whereas hwpoison_inject() uses compound_head() and expects the tails to be included. So hwpoison testing coverage has probably been restricted by the wrong output from page_cgroup_ino() (if that memcg filter is used at all): in the short term, it might be safer not to enable wider coverage there, but long term we would regret that. This is based on a patch originally written by Hugh Dickins and retains most of the original commit log [1] The patch was changed to use folio_memcg_check(page_folio(page)) instead of page_memcg_check(compound_head(page)) based on discussions with Matthew Wilcox; where he stated that callers of page_memcg_check() should stop using it due to the ambiguity around tail pages -- instead they should use folio_memcg_check() and handle tail pages themselves. Link: https://lkml.kernel.org/r/20230412003451.4018887-1-yosryahmed@google.com Link: https://lore.kernel.org/linux-mm/20230313083452.1319968-1-yosryahmed@google.com/ [1] Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAPAneesh Kumar K.V
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/vmemmap/devdax: fix kernel crash when probing devdax devicesAneesh Kumar K.V
commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Link: https://lkml.kernel.org/r/20230411142214.64464-1-aneesh.kumar@linux.ibm.com Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Tarun Sahu <tsahu@linux.ibm.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add uffdio register ioctls testPeter Xu
This new test tests against the returned ioctls from UFFDIO_REGISTER, where put into uffdio_register.ioctls. This also tests the expected failure cases of UFFDIO_REGISTER, aka: - Register with empty mode should fail with -EINVAL - Register minor without page cache (anon) should fail with -EINVAL Link: https://lkml.kernel.org/r/20230412164548.329376-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add shmem-private test to uffd-stressPeter Xu
The userfaultfd stress test never tested private shmem, which I think was overlooked long due. Add it so it matches with uffd unit test and it'll cover all memory supported with the three memory types. Meanwhile, rename the memory types a bit. Considering shared mem is the major use case for both shmem / hugetlbfs, changing from: anon, hugetlb, hugetlb_shared, shmem To (with shmem-private added): anon, hugetlb, hugetlb-private, shmem, shmem-private Add the shmem-private to run_vmtests.sh too. Link: https://lkml.kernel.org/r/20230412164546.329355-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: drop sys/dev test in uffd-stress testPeter Xu
With the new uffd unit test covering the /dev/userfaultfd path and syscall path of uffd initializations, we can safely drop the devnode test in the old stress test. One thing is to avoid duplication of running the stress test twice which is an overkill to only test the /dev/ interface in run_vmtests.sh. The other benefit is now all uffd tests (that uses userfaultfd_open) can run automatically as long as any type of interface is enabled (either syscall or dev), so it's more likely to succeed rather than fail due to unprivilege. With this patch lands, we can drop all the "mem_type:XXX" handlings too. Link: https://lkml.kernel.org/r/20230412164525.329176-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: allow uffd test to skip properly with no privilegePeter Xu
Allow skip a unit test properly due to no privilege (e.g. sigbus and events tests). [colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"] Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: workaround no way to detect uffd-minor + wpPeter Xu
Userfaultfd minor+wp mode was very recently added. The test will fail on the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious. Unfortunately there's no feature bit to detect for this support. Add a hack to leverage WP_UNPOPULATED to detect whether that feature existed, since WP_UNPOPULATED was merged right after minor+wp. Link: https://lkml.kernel.org/r/20230412164517.329152-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move zeropage test into uffd unit testsPeter Xu
Simplifies it a bit along the way, e.g., drop the never used offset field (which was always the 1st page so offset=0). Introduce uffd_register_with_ioctls() out of uffd_register() to detect uffdio_register.ioctls got returned. Check that automatically when testing UFFDIO_ZEROPAGE on different types of memory (and kernel). Link: https://lkml.kernel.org/r/20230412164404.328815-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move uffd sig/events tests into uffd unit testsPeter Xu
Move the two tests into the unit test, and convert it into 20 standalone tests: - events test on all 5 mem types, with wp on/off - signal test on all 5 mem types, with wp on/off Testing sigbus on anon... done Testing sigbus on shmem... done Testing sigbus on shmem-private... done Testing sigbus on hugetlb... done Testing sigbus on hugetlb-private... done Testing sigbus-wp on anon... done Testing sigbus-wp on shmem... done Testing sigbus-wp on shmem-private... done Testing sigbus-wp on hugetlb... done Testing sigbus-wp on hugetlb-private... done Testing events on anon... done Testing events on shmem... done Testing events on shmem-private... done Testing events on hugetlb... done Testing events on hugetlb-private... done Testing events-wp on anon... done Testing events-wp on shmem... done Testing events-wp on shmem-private... done Testing events-wp on hugetlb... done Testing events-wp on hugetlb-private... done It'll also remove a lot of global references along the way, e.g. test_uffdio_wp will be replaced with the wp value passed over. Link: https://lkml.kernel.org/r/20230412164400.328798-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move uffd minor test to unit testPeter Xu
This moves the minor test to the new unit test. Rewrite the content check with char* opeartions to avoid fiddling with my_bcmp(). Drop global vars test_uffdio_minor and test_collapse, just assume test them always in common code for now. OTOH make this single test into five tests: - minor test on [shmem, hugetlb] with wp=false - minor test on [shmem, hugetlb] with wp=true - minor test + collapse on shmem only One thing to mention that we used to test COLLAPSE+WP but that doesn't sound right at all. It's possible it's silently broken but unnoticed because COLLAPSE is not part of the default test suite. Make the MADV_COLLAPSE test fail-able (by skip it when failing), because it's not guaranteed to success anyway. Drop a bunch of useless code after the move, because the unit test always use aligned num of pages and has nothing to do with n_cpus. Link: https://lkml.kernel.org/r/20230412164357.328779-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move uffd pagemap test to unit testPeter Xu
Move it over and make it split into two tests, one for pagemap and one for the new WP_UNPOPULATED (to be a separate one). The thp pagemap test wasn't really working (with MADV_HUGEPAGE). Let's just drop it (since it never really worked anyway..) and leave that for later. Link: https://lkml.kernel.org/r/20230412164352.328733-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add framework for uffd-unit-testPeter Xu
Add a framework to be prepared to move unit tests from uffd-stress.c into uffd-unit-tests.c. The goal is to allow detection of uffd features for each test, and also loop over specified types of memory that a test support. Link: https://lkml.kernel.org/r/20230412164348.328710-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: allow allocate_area() to fail properlyPeter Xu
Mostly to detect hugetlb allocation errors and skip hugetlb tests when pages are not allocated. Link: https://lkml.kernel.org/r/20230412164345.328659-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: let uffd_handle_page_fault() take wp parameterPeter Xu
Make the handler optionally apply WP bit when resolving page faults for either missing or minor page faults. This moves towards removing global test_uffdio_wp outside of the common code. Link: https://lkml.kernel.org/r/20230412164341.328618-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: rename uffd_stats to uffd_argsPeter Xu
Prepare for adding more fields into the struct. Link: https://lkml.kernel.org/r/20230412164337.328607-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: drop global hpage_size in uffd testsPeter Xu
hpage_size was wrongly used. Sometimes it means hugetlb default size, sometimes it was used as thp size. Remove the global variable and use the right one at each place. Link: https://lkml.kernel.org/r/20230412164333.328596-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: drop global mem_fd in uffd testsPeter Xu
Drop it by creating the memfd dynamically in the tests. Link: https://lkml.kernel.org/r/20230412164331.328584-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: UFFDIO_API testPeter Xu
Add one simple test for UFFDIO_API. With that, I also added a bunch of small but handy helpers along the way. Link: https://lkml.kernel.org/r/20230412164257.328375-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: uffd_open_{dev|sys}()Peter Xu
Provide two helpers to open an uffd handle. Drop the error checks around SKIPs because it's inside an errexit() anyway, which IMHO doesn't really help much if the test will not continue. Link: https://lkml.kernel.org/r/20230412164254.328335-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: uffd_[un]register()Peter Xu
Add two helpers to register/unregister to an uffd. Use them to drop duplicate codes. This patch also drops assert_expected_ioctls_present() and get_expected_ioctls(). Reasons: - It'll need a lot of effort to pass test_type==HUGETLB into it from the upper, so it's the simplest way to get rid of another global var - The ioctls returned in UFFDIO_REGISTER is hardly useful at all, because any app can already detect kernel support on any ioctl via its corresponding UFFD_FEATURE_*. The check here is for sanity mostly but it's probably destined no user app will even use it. - It's not friendly to one future goal of uffd to run on old kernels, the problem is get_expected_ioctls() compiles against UFFD_API_RANGE_IOCTLS, which is a value that can change depending on where the test is compiled, rather than reflecting what the kernel underneath has. It means it'll report false negatives on old kernels so it's against our will. So let's make our lives easier. [peterx@redhat.com; tools/testing/selftests/mm/hugepage-mremap.c: add headers] Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n Link: https://lkml.kernel.org/r/20230412164247.328293-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: split uffd tests into uffd-stress and uffd-unit-testsPeter Xu
In many ways it's weird and unwanted to keep all the tests in the same userfaultfd.c at least when still in the current way. For example, it doesn't make much sense to run the stress test for each method we can create an userfaultfd handle (either via syscall or /dev/ node). It's a waste of time running this twice for the whole stress as the stress paths are the same, only the open path is different. It's also just weird to need to manually specify different types of memory to run all unit tests for the userfaultfd interface. We should be able to just run a single program and that should go through all functional uffd tests without running the stress test at all. The stress test was more for torturing and finding race conditions. We don't want to wait for stress to finish just to regress test a functional test. When we start to pile up more things on top of the same file and same functions, things start to go a bit chaos and the code is just harder to maintain too with tons of global variables. This patch creates a new test uffd-unit-tests to keep userfaultfd unit tests in the future, currently empty. Meanwhile rename the old userfaultfd.c test to uffd-stress.c. Link: https://lkml.kernel.org/r/20230412164244.328270-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: create uffd-common.[ch]Peter Xu
Move common utility functions into uffd-common.[ch] files from the original userfaultfd.c. This prepares for a split of userfaultfd.c into two tests: one to only cover the old but powerful stress test, the other one covers all the functional tests. This movement is kind of a brute-force effort for now, with light touch-ups but nothing should really change. There's chances to optimize more, but let's leave that for later. Link: https://lkml.kernel.org/r/20230412164241.328259-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: drop test_uffdio_zeropage_eexistPeter Xu
The idea was trying to flip this var in the alarm handler from time to time to test -EEXIST of UFFDIO_ZEROPAGE, but firstly it's only used in the zeropage test so probably only used once, meanwhile we passed "retry==false" so it'll never got tested anyway. Drop both sides so we always test UFFDIO_ZEROPAGE retries if has_zeropage is set (!hugetlb). One more thing to do is doing UFFDIO_REGISTER for the alias buffer too, because otherwise the test won't even pass! We were just lucky that this test never really got ran at all. Link: https://lkml.kernel.org/r/20230412164238.328238-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: test UFFDIO_ZEROPAGE only when !hugetlbPeter Xu
Make the check as simple as "test_type == TEST_HUGETLB" because that's the only mem that doesn't support ZEROPAGE. Link: https://lkml.kernel.org/r/20230412164234.328168-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: reuse pagemap_get_entry() in vm_util.hPeter Xu
Meanwhile drop pagemap_read_vaddr(). Link: https://lkml.kernel.org/r/20230412164231.328157-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: use PM_* macros in vm_utils.hPeter Xu
We've got the macros in uffd-stress.c, move it over and use it in vm_util.h. Link: https://lkml.kernel.org/r/20230412164227.328145-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: merge default_huge_page_size() into onePeter Xu
There're already 3 same definitions of the three functions. Move it into vm_util.[ch]. Link: https://lkml.kernel.org/r/20230412164223.328134-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: link vm_util.c alwaysPeter Xu
We do have plenty of files that want to link against vm_util.c. Just make it simple by linking it always. Link: https://lkml.kernel.org/r/20230412164220.328123-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: use TEST_GEN_PROGS where properPeter Xu
TEST_GEN_PROGS and TEST_GEN_FILES are used randomly in the mm/Makefile to specify programs that need to build. Logically all these binaries should all fall into TEST_GEN_PROGS. Replace those TEST_GEN_FILES with TEST_GEN_PROGS, so that we can reference all the tests easily later. [peterx@redhat.com: tools/testing/selftests/mm/Makefile: don't wipe out TEST_GEN_PROGS] Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n Link: https://lkml.kernel.org/r/20230412164218.328104-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: merge util.h into vm_util.hPeter Xu
There're two util headers under mm/ kselftest. Merge one with another. It turns out util.h is the easy one to move. When merging, drop PAGE_SIZE / PAGE_SHIFT because they're unnecessary wrappers to page_size() / page_shift(), meanwhile rename them to psize() and pshift() so as to not conflict with some existing definitions in some test files that includes vm_util.h. Link: https://lkml.kernel.org/r/20230412164120.327731-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: dump a summary in run_vmtests.shPeter Xu
Dump a summary after running whatever test specified. Useful for human runners to identify any kind of failures (besides exit code). Link: https://lkml.kernel.org/r/20230412164117.327720-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: update .gitignore with two missing testsPeter Xu
Patch series "selftests/mm: Split / Refactor userfault test", v2. This patchset splits userfaultfd.c into two tests: - uffd-stress: the "vanilla", old and powerful stress test - uffd-unit-tests: all the unit tests will be moved here This is on my todo list for a long time but I never did it for real. The uffd test is growing into a small and cute monster. I start to notice it's going harder to maintain such a test and make it useful. A few issues I found when looking at userfaultfd test: - We have a bunch of unit tests in userfaultfd.c, but they always need to be run only after a stress type. No way to not do it. - We can only run an unit test for one memory type only, if we want to do a quick smoke test to check regressions, there's no good way. The best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks to the most recent changes to run_vmtests.sh on tagging. Still, that needs to run the stress tests always and hard to see what's wrong. - It's hard to add a new unit test to userfaultfd.c, we don't really know what's happening, not until we mostly read the whole file. - We did a bunch of useless tests, e.g. we run twice the whole suite of stress test just to verify both syscall and /dev/userfaultfd. They're all using userfaultfd_new() to create the handle, everything should really be the same underneath. One simple unit test should cover that! - We have tens of global variables in one file but shared with all the tests. Some of them are not suitable to be a global var from maintainance pov. It enforces every unit test to consider how these vars affects the stress test and vice versa, but that's logically not necessary. - Userfaultfd test is not friendly to old kernels. Mostly it only works on the latest kernel tree. It's preferrable to be run on all kernels and properly report what's missing. I'll stop here, I feel like I can still list some.. This patchset should resolve all issues above, and actually we can do even more on top. I stopped doing that until I found I already got 29 patches and 2000+ LOC changes. That's already a patchset terrible enough so we should move in small steps. After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like this: ===8<=== vm.nr_hugepages = 1024 ------------------------- running ./uffd-unit-tests ------------------------- Testing UFFDIO_API (with syscall)... done Testing UFFDIO_API (with /dev/userfaultfd)... done Testing register-ioctls on anon... done Testing register-ioctls on shmem... done Testing register-ioctls on shmem-private... done Testing register-ioctls on hugetlb... done Testing register-ioctls on hugetlb-private... done Testing zeropage on anon... done Testing zeropage on shmem... done Testing zeropage on shmem-private... done Testing zeropage on hugetlb... done Testing zeropage on hugetlb-private... done Testing pagemap on anon... done Testing wp-unpopulated on anon... done Testing minor on shmem... done Testing minor on hugetlb... done Testing minor-wp on shmem... done Testing minor-wp on hugetlb... done Testing minor-collapse on shmem... done Testing sigbus on anon... done Testing sigbus on shmem... done Testing sigbus on shmem-private... done Testing sigbus on hugetlb... done Testing sigbus on hugetlb-private... done Testing sigbus-wp on anon... done Testing sigbus-wp on shmem... done Testing sigbus-wp on shmem-private... done Testing sigbus-wp on hugetlb... done Testing sigbus-wp on hugetlb-private... done Testing events on anon... done Testing events on shmem... done Testing events on shmem-private... done Testing events on hugetlb... done Testing events on hugetlb-private... done Testing events-wp on anon... done Testing events-wp on shmem... done Testing events-wp on shmem-private... done Testing events-wp on hugetlb... done Testing events-wp on hugetlb-private... done Userfaults unit tests: pass=39, skip=0, fail=0 (total=39) [PASS] -------------------------------- running ./uffd-stress anon 20 16 -------------------------------- nr_pages: 5120, nr_pages_per_cpu: 640 bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86) [...] [PASS] ------------------------------------ running ./uffd-stress hugetlb 128 32 ------------------------------------ nr_pages: 64, nr_pages_per_cpu: 8 bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1) [...] [PASS] -------------------------------------------- running ./uffd-stress hugetlb-private 128 32 -------------------------------------------- nr_pages: 64, nr_pages_per_cpu: 8 bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1) [...] [PASS] --------------------------------- running ./uffd-stress shmem 20 16 --------------------------------- nr_pages: 5120, nr_pages_per_cpu: 640 bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204) [...] [PASS] ----------------------------------------- running ./uffd-stress shmem-private 20 16 ----------------------------------------- nr_pages: 5120, nr_pages_per_cpu: 640 bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191) [...] [PASS] SUMMARY: PASS=6 SKIP=0 FAIL=0 ===8<=== The output may be different if we miss some features (e.g., hugetlb not allocated, old kernel, less privilege of uffd handle), but they should show up with good reasons. E.g., I tried to run the unit test on my Fedora kernel and it gives me: ===8<=== UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't] UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle] zeropage on anon... done zeropage on shmem... done zeropage on shmem-private... done zeropage-hugetlb on hugetlb... done zeropage-hugetlb on hugetlb-private... done pagemap on anon... pagemap on anon... pagemap on anon... done wp-unpopulated on anon... skipped [reason: feature missing] minor on shmem... done minor on hugetlb... done minor-wp on shmem... skipped [reason: feature missing] minor-wp on hugetlb... skipped [reason: feature missing] minor-collapse on shmem... done sigbus on anon... skipped [reason: possible lack of priviledge] sigbus on shmem... skipped [reason: possible lack of priviledge] sigbus on shmem-private... skipped [reason: possible lack of priviledge] sigbus on hugetlb... skipped [reason: possible lack of priviledge] sigbus on hugetlb-private... skipped [reason: possible lack of priviledge] sigbus-wp on anon... skipped [reason: possible lack of priviledge] sigbus-wp on shmem... skipped [reason: possible lack of priviledge] sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge] sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge] sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge] events on anon... skipped [reason: possible lack of priviledge] events on shmem... skipped [reason: possible lack of priviledge] events on shmem-private... skipped [reason: possible lack of priviledge] events on hugetlb... skipped [reason: possible lack of priviledge] events on hugetlb-private... skipped [reason: possible lack of priviledge] events-wp on anon... skipped [reason: possible lack of priviledge] events-wp on shmem... skipped [reason: possible lack of priviledge] events-wp on shmem-private... skipped [reason: possible lack of priviledge] events-wp on hugetlb... skipped [reason: possible lack of priviledge] events-wp on hugetlb-private... skipped [reason: possible lack of priviledge] Userfaults unit tests: pass=9, skip=24, fail=1 (total=34) ===8<=== Patch layout: - Revert "userfaultfd: don't fail on unrecognized features" Something I found when I got the UFFDIO_API test below. Axel, I still propose to revert it as a whole, but feel free to continue the discussion from the original patch thread. - selftests/mm: Update .gitignore with two missing tests - selftests/mm: Dump a summary in run_vmtests.sh - selftests/mm: Merge util.h into vm_util.h - selftests/mm: Use TEST_GEN_PROGS where proper - selftests/mm: Link vm_util.c always - selftests/mm: Merge default_huge_page_size() into one - selftests/mm: Use PM_* macros in vm_utils.h - selftests/mm: Reuse pagemap_get_entry() in vm_util.h - selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb - selftests/mm: Drop test_uffdio_zeropage_eexist Until here, all cleanups here and there. I wanted to keep going, but I found that maybe it'll take a few more days to split the test. Hence I did a split starting from the next one, so we have a working thing first. - selftests/mm: Create uffd-common.[ch] - selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests This did the major brute force split of common codes into uffd-common.[ch]. That'll be the so far common base for stress and unit tests. Then a new unit test is created. - selftests/mm: uffd_[un]register() - selftests/mm: uffd_open_{dev|sys}() - selftests/mm: UFFDIO_API test This patch hides here to start writting the 1st unit test with UFFDIO_API, also detection of userfaultfd privileges. - selftests/mm: Drop global mem_fd in uffd tests - selftests/mm: Drop global hpage_size in uffd tests - selftests/mm: Rename uffd_stats to uffd_args - selftests/mm: Let uffd_handle_page_fault() takes wp parameter - selftests/mm: Allow allocate_area() to fail properly Some further cleanup that I noticed otherwise hard to move the tests. - selftests/mm: Add framework for uffd-unit-test The major patch provides the framework for most of the rest unit tests. - selftests/mm: Move uffd pagemap test to unit test - selftests/mm: Move uffd minor test to unit test - selftests/mm: Move uffd sig/events tests into uffd unit tests - selftests/mm: Move zeropage test into uffd unit tests Move unit tests and suite them into the new file. - selftests/mm: Workaround no way to detect uffd-minor + wp - selftests/mm: Allow uffd test to skip properly with no privilege - selftests/mm: Drop sys/dev test in uffd-stress test - selftests/mm: Add shmem-private test to uffd-stress A bunch of changes to do better on error reportings, and add shmem-private to the stress test which was long missing. - selftests/mm: Add uffdio register ioctls test One more patch to test uffdio_register.ioctls. This patch (of 30): Update .gitignore with two missing tests. Link: https://lkml.kernel.org/r/20230412163922.327282-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230412164114.327709-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/vmscan: simplify shrink_node()Haifeng Xu
The difference between sc->nr_reclaimed and nr_reclaimed is computed three times. Introduce a new variable to record the value, so it only needs to be computed once. Link: https://lkml.kernel.org/r/20230411061757.12041-1-haifeng.xu@shopee.com Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mpage: use folios in bio end_io handlerPankaj Raghav
Use folios in the bio end_io handler. This conversion does the appropriate handling on the folios in the respective end_io callback and removes the call to page_endio(), which is soon to be removed. Link: https://lkml.kernel.org/r/20230411122920.30134-4-p.raghav@samsung.com Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Martin Brandenburg <martin@omnibond.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mpage: split submit_bio and bio end_io handler for reads and writesPankaj Raghav
Split the submit_bio() and bio end_io handler for reads and writes similar to other aops. This is a prep patch before we convert end_io handlers to use folios. Link: https://lkml.kernel.org/r/20230411122920.30134-3-p.raghav@samsung.com Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Martin Brandenburg <martin@omnibond.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18orangefs: use folios in orangefs_readaheadPankaj Raghav
Patch series "remove page_endio()", v3. It was decided to remove the page_endio() as per the previous RFC discussion[1] of this series and move that functionality into the caller itself. One of the side benefit of doing that is the callers have been modified to directly work on folios as page_endio() already worked on folios. As Christoph is doing ZRAM cleanups[4] which will get rid of page_endio() function usage, I removed the final patch that removes page_endio()[5]. I will send it separately after rc-1 once the zram cleanups are merged. mpage changes were tested with a simple boot testing and running a fio workload on ext2 filesystem. orangefs was tested by Mike Marshall (No code changes since he tested). This patch (of 3): Convert orangefs_readahead() from using struct page to struct folio. This conversion removes the call to page_endio() which is soon to be removed, and simplifies the final page handling. The page error flags is not required to be set in the error case as orangefs doesn't depend on them. Link: https://lkml.kernel.org/r/20230411122920.30134-1-p.raghav@samsung.com Link: https://lkml.kernel.org/r/20230411122920.30134-2-p.raghav@samsung.com Link: https://lore.kernel.org/linux-mm/ZBHcl8Pz2ULb4RGD@infradead.org/ [1] Link: https://lore.kernel.org/linux-mm/20230322135013.197076-1-p.raghav@samsung.com/ [2] Link: https://lore.kernel.org/linux-mm/8adb0770-6124-e11f-2551-6582db27ed32@samsung.com/ [3] Link: https://lore.kernel.org/linux-block/20230404150536.2142108-1-hch@lst.de/T/#t [4] Link: https://lore.kernel.org/lkml/20230403132221.94921-6-p.raghav@samsung.com/ [5] Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Mike Marshall <hubcap@omnibond.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Martin Brandenburg <martin@omnibond.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/huge_memory: conditionally call maybe_mkwrite() and drop pte_wrprotect() ↵David Hildenbrand
in __split_huge_pmd_locked() No need to call maybe_mkwrite() to then wrprotect if the source PMD was not writable. It's worth nothing that this now allows for PTEs to be writable even if the source PMD was not writable: if vma->vm_page_prot includes write permissions. As documented in commit 931298e103c2 ("mm/userfaultfd: rely on vma->vm_page_prot in uffd_wp_range()"), any mechanism that intends to have pages wrprotected (COW, writenotify, mprotect, uffd-wp, softdirty, ...) has to properly adjust vma->vm_page_prot upfront, to not include write permissions. If vma->vm_page_prot includes write permissions, the PTE/PMD can be writable as default. This now mimics the handling in mm/migrate.c:remove_migration_pte() and in mm/huge_memory.c:remove_migration_pmd(), which has been in place for a long time (except that 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64") temporarily changed it). Link: https://lkml.kernel.org/r/20230411142512.438404-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit when thp ↵David Hildenbrand
splits on pmd"" This reverts commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"") and the fixup in commit e833bc503405 ("mm/thp: re-apply mkdirty for small pages after split"). Now that sparc64 mkdirty handling is fixed and no longer sets a PTE/PMD writable that shouldn't be writable, let's revert the temporary fix and remove the stale comment. The mkdirty mm selftest still passes with this change on sparc64. Note that loongarch handling was fixed in commit bf2f34a506e6 ("LoongArch: Set _PAGE_DIRTY only if _PAGE_WRITE is set in {pmd,pte}_mkdirty()") Link: https://lkml.kernel.org/r/20230411142512.438404-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/migrate: revert "mm/migrate: fix wrongly apply write bit after mkdirty on ↵David Hildenbrand
sparc64" This reverts commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64"). Now that sparc64 mkdirty handling is fixed and no longer sets a PTE/PMD writable that shouldn't be writable, let's revert the temporary fix. The mkdirty mm selftest still passes with this change on sparc64. Note that loongarch handling was fixed in commit bf2f34a506e6 ("LoongArch: Set _PAGE_DIRTY only if _PAGE_WRITE is set in {pmd,pte}_mkdirty()"). Link: https://lkml.kernel.org/r/20230411142512.438404-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty ↵David Hildenbrand
on 64bit On sparc64, there is no HW modified bit, therefore, SW tracks via a SW bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() currently also unconditionally sets the HW writable bit, which is wrong. pte_mkdirty() is not supposed to make a PTE actually writable, unless the SW writable bit -- pte_write() -- indicates that the PTE is not write-protected. Fortunately, sparc64 also defines a SW writable bit. For example, this already turned into a problem in the context of THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd""), and for page migration, as documented in commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64"). Also, we might want to use the dirty PTE bit in the context of KSM with shared zeropage [1], whereby setting the page writable would be problematic. But more general, any code that might end up setting a PTE/PMD dirty inside a VM without write permissions is possibly broken, Before this commit (sun4u in QEMU): root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 3 out of 6 tests failed # Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0 Test #3,#4,#5 pass ever since we added some MM workarounds, the underlying issue remains. Let's fix the remaining issues and prepare for reverting the workarounds by setting the HW writable bit only if both, the SW dirty bit and the SW writable bit are set. We have to move pte_dirty() and pte_write() up. The code patching mechanism and handling constants > 22bit is a bit special on sparc64. The ASM logic in pte_mkdirty() and pte_mkwrite() match the logic in pte_mkold() to create the mask depending on the machine type. The ASM logic in __pte_mkhwwrite() matches the logic in pte_present(), just using an "or" instead of an "and" instruction. With this commit (sun4u in QEMU): root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY ok 6 SIGSEGV generated, page not modified # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 This handling seems to have been in place forever. [1] https://lkml.kernel.org/r/533a7c3d-3a48-b16b-b421-6e8386e0b142@redhat.com Link: https://lkml.kernel.org/r/20230411142512.438404-4-david@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs without ↵David Hildenbrand
write permissions Let's add some tests that trigger (pte|pmd)_mkdirty on VMAs without write permissions. If an architecture implementation is wrong, we might accidentally set the PTE/PMD writable and allow for write access in a VMA without write permissions. The tests include reproducers for the two issues recently discovered and worked-around in core-MM for now: (1) commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"") (2) commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64") In addition, some other tests that reveal further issues. All tests pass under x86_64: ./mkdirty # [INFO] detected THP size: 2048 KiB TAP version 13 1..6 # [INFO] PTRACE write access ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY ok 6 SIGSEGV generated, page not modified # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 But some fail on sparc64: ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 3 out of 6 tests failed # Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0 Reverting both above commits makes all tests fail on sparc64: ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration not ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP not ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP not ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 6 out of 6 tests failed # Totals: pass:0 fail:6 xfail:0 xpass:0 skip:0 error:0 The tests are useful to detect other problematic archs, to verify new arch fixes, and to stop such issues from reappearing in the future. For now, we don't add any hugetlb tests. Link: https://lkml.kernel.org/r/20230411142512.438404-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: reuse read_pmd_pagesize() in COW selftestDavid Hildenbrand
Patch series "mm: (pte|pmd)_mkdirty() should not unconditionally allow for write access". This is the follow-up on [1], adding selftests (testing for known issues we added workarounds for and other issues that haven't been fixed yet), fixing sparc64, reverting the workarounds, and perform one cleanup. The patch from [1] was modified slightly (updated/extended patch description, dropped one unnecessary NOP instruction from the ASM in __pte_mkhwwrite()). Retested on x86_64 and sparc64 (sun4u in QEMU). I scanned most architectures to make sure their (pte|pmd)_mkdirty() handling is correct. To be sure, we can run the selftests and find out if other architectures are still affectes (loongarch was fixed recently as well). Based on master for now. I don't expect surprises regarding mm-tress, but I can rebase if there are any problems. This patch (of 6): The COW selftest can deal with THP not being configured. So move error handling of read_pmd_pagesize() into the callers such that we can reuse it in the COW selftest. Link: https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com Link: https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com [1] Link: https://lkml.kernel.org/r/20230411142512.438404-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: return errors from read_from_bdev_syncChristoph Hellwig
Propagate read errors to the caller instead of dropping them on the floor, and stop returning the somewhat dangerous 1 on success from read_from_bdev*. Link: https://lkml.kernel.org/r/20230411171459.567614-18-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: fix synchronous readsChristoph Hellwig
Currently nothing waits for the synchronous reads before accessing the data. Switch them to an on-stack bio and submit_bio_wait to make sure the I/O has actually completed when the work item has been flushed. This also removes the call to page_endio that would unlock a page that has never been locked. Drop the partial_io/sync flag, as chaining only makes sense for the asynchronous reads of the entire page. Link: https://lkml.kernel.org/r/20230411171459.567614-17-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>