summaryrefslogtreecommitdiff
path: root/tools/include/uapi/asm-generic/mman-common.h
AgeCommit message (Collapse)Author
2022-09-11mm/madvise: introduce MADV_COLLAPSE sync hugepage collapseZach O'Keefe
This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse Semantics This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE. If the ranges provided span multiple VMAs, the semantics of the collapse over each VMA is independent from the others. This implies a hugepage cannot cross a VMA boundary. If collapse of a given hugepage-aligned/sized region fails, the operation may continue to attempt collapsing the remainder of memory specified. The memory ranges provided must be page-aligned, but are not required to be hugepage-aligned. If the memory ranges are not hugepage-aligned, the start/end of the range will be clamped to the first/last hugepage-aligned address covered by said range. The memory ranges must span at least one hugepage-sized region. All non-resident pages covered by the range will first be swapped/faulted-in, before being internally copied onto a freshly allocated hugepage. Unmapped pages will have their data directly initialized to 0 in the new hugepage. However, for every eligible hugepage aligned/sized region to-be collapsed, at least one page must currently be backed by memory (a PMD covering the address range must already exist). Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags. When the system has multiple NUMA nodes, the hugepage will be allocated from the node providing the most native pages. This operation operates on the current state of the specified process and makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future Return Value If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. On success, process_madvise(2) returns the number of bytes advised, and madvise(2) returns 0. Else, -1 is returned and errno is set to indicate the error for the most-recently attempted hugepage collapse. Note that many failures might have occurred, since the operation may continue to collapse in the event a single hugepage-sized/aligned region fails. ENOMEM Memory allocation failed or VMA not found EBUSY Memcg charging failed EAGAIN Required resource temporarily unavailable. Try again might succeed. EINVAL Other error: No PMD found, subpage doesn't have Present bit set, "Special" page no backed by struct page, VMA incorrectly sized, address not page-aligned, ... Most notable here is ENOMEM and EBUSY (new to madvise) which are intended to provide the caller with actionable feedback so they may take an appropriate fallback measure. Use Cases An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but additional support for file, shmem, and HugeTLB high-granularity mappings[2] is expected. File and tmpfs/shmem support would permit: * Backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. * Backing guest memory by hugapages after the memory contents have been migrated in native-page-sized chunks to a new host, in a userfaultfd-based live-migration stack. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc [jrdr.linux@gmail.com: avoid possible memory leak in failure path] Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com [zokeefe@google.com add missing kfree() to madvise_collapse()] Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/ Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com [zokeefe@google.com: delay computation of hpage boundaries until use]] Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Signed-off-by: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Suggested-by: David Rientjes <rientjes@google.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-01tools headers UAPI: Sync asm-generic/mman-common.h with the kernelArnaldo Carvalho de Melo
To pick the changes from: 9457056ac426e5ed ("mm: madvise: MADV_DONTNEED_LOCKED") That result in these changes in the tools: $ diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h --- tools/include/uapi/asm-generic/mman-common.h 2022-03-29 16:17:50.461694991 -0300 +++ include/uapi/asm-generic/mman-common.h 2022-03-27 19:12:48.923250468 -0300 @@ -75,6 +75,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + /* compatibility flags */ #define MAP_FILE 0 $ tools/perf/trace/beauty/madvise_behavior.sh > before $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h $ tools/perf/trace/beauty/madvise_behavior.sh > after $ diff -u before after --- before 2022-03-29 16:18:04.091044244 -0300 +++ after 2022-03-29 16:18:11.692238906 -0300 @@ -20,6 +20,7 @@ [21] = "PAGEOUT", [22] = "POPULATE_READ", [23] = "POPULATE_WRITE", + [24] = "DONTNEED_LOCKED", [100] = "HWPOISON", [101] = "SOFT_OFFLINE", }; $ I.e. now when madvise gets those behaviours as args, 'perf trace' will be able to translate from the number to a human readable string and to use the strings in tracepoint filter expressions. This addresses the following perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/YkNcUfeh795yqGMV@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-05tools headers UAPI: Sync asm-generic/mman-common.h with the kernelArnaldo Carvalho de Melo
To pick the changes from: Fixes: 4ca9b3859dac14bb ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") That result in these changes in the tools: $ tools/perf/trace/beauty/madvise_behavior.sh > before $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h $ tools/perf/trace/beauty/madvise_behavior.sh > after $ diff -u before after --- before 2021-07-05 14:30:15.167212621 -0300 +++ after 2021-07-05 14:30:26.638462594 -0300 @@ -18,6 +18,8 @@ [19] = "KEEPONFORK", [20] = "COLD", [21] = "PAGEOUT", + [22] = "POPULATE_READ", + [23] = "POPULATE_WRITE", [100] = "HWPOISON", [101] = "SOFT_OFFLINE", }; $ I.e. now when madvise gets those behaviours as args, it will be able to translate from the number to a human readable string. This addresses the following perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Cc: David Hildenbrand <david@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-12tools headers UAPI: Sync asm-generic/mman-common.h with the kernelArnaldo Carvalho de Melo
To pick the changes from: d41938d2cbee ("mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use") No changes in tooling, just a rebuild as files needed got touched. This addresses the following perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30tools headers uapi: Sync asm-generic/mman-common.h with the kernelArnaldo Carvalho de Melo
To pick the changes from: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT") 9c276cc65a58 ("mm: introduce MADV_COLD") That result in these changes in the tools: $ tools/perf/trace/beauty/madvise_behavior.sh > before $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h $ git diff diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h index 63b1f506ea67..c160a5354eb6 100644 --- a/tools/include/uapi/asm-generic/mman-common.h +++ b/tools/include/uapi/asm-generic/mman-common.h @@ -67,6 +67,9 @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ + /* compatibility flags */ #define MAP_FILE 0 $ tools/perf/trace/beauty/madvise_behavior.sh > after $ diff -u before after --- before 2019-09-27 11:29:43.346320100 -0300 +++ after 2019-09-27 11:30:03.838570439 -0300 @@ -16,6 +16,8 @@ [17] = "DODUMP", [18] = "WIPEONFORK", [19] = "KEEPONFORK", + [20] = "COLD", + [21] = "PAGEOUT", [100] = "HWPOISON", [101] = "SOFT_OFFLINE", }; $ I.e. now when madvise gets those behaviours as args, it will be able to translate from the number to a human readable string. This addresses the following perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-n40y6c4sa49p29q6sl8w3ufx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29tools headers UAPI: Update tools's copy of mman.h headersArnaldo Carvalho de Melo
To pick up the changes from: 8aa3c927ec10 ("mm/mmap: move common defines to mman-common.h") 22fcea6f85f2 ("mm: move MAP_SYNC to asm-generic/mman-common.h") 0bf5f9492389 ("mm: fix the MAP_UNINITIALIZED flag") To address the following perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman.h' differs from latest version at 'include/uapi/asm-generic/mman.h' diff -u tools/include/uapi/asm-generic/mman.h include/uapi/asm-generic/mman.h That ends up just moving a bit the auto-generated code->string tables: $ tools/perf/trace/beauty/mmap_flags.sh > before $ cp include/uapi/asm-generic/mman.h tools/include/uapi/asm-generic/mman.h $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h $ tools/perf/trace/beauty/mmap_flags.sh > after $ diff -u before after --- before 2019-07-26 12:45:02.948335904 -0300 +++ after 2019-07-26 12:48:05.342893539 -0300 @@ -4,15 +4,15 @@ [ilog2(0x02) + 1] = "PRIVATE", [ilog2(0x10) + 1] = "FIXED", [ilog2(0x20) + 1] = "ANONYMOUS", + [ilog2(0x008000) + 1] = "POPULATE", + [ilog2(0x010000) + 1] = "NONBLOCK", + [ilog2(0x020000) + 1] = "STACK", + [ilog2(0x040000) + 1] = "HUGETLB", + [ilog2(0x080000) + 1] = "SYNC", [ilog2(0x100000) + 1] = "FIXED_NOREPLACE", [ilog2(0x0100) + 1] = "GROWSDOWN", [ilog2(0x0800) + 1] = "DENYWRITE", [ilog2(0x1000) + 1] = "EXECUTABLE", [ilog2(0x2000) + 1] = "LOCKED", [ilog2(0x4000) + 1] = "NORESERVE", - [ilog2(0x8000) + 1] = "POPULATE", - [ilog2(0x10000) + 1] = "NONBLOCK", - [ilog2(0x20000) + 1] = "STACK", - [ilog2(0x40000) + 1] = "HUGETLB", - [ilog2(0x80000) + 1] = "SYNC", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fzqvzni9megaurmsp0k4vy27@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-28tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.hArnaldo Carvalho de Melo
To deal with the move of some defines from asm-generic/mmap-common.h to linux/mman.h done in: 746c9398f5ac ("arch: move common mmap flags to linux/mman.h") The generated mmap_flags array stays the same: $ tools/perf/trace/beauty/mmap_flags.sh static const char *mmap_flags[] = { [ilog2(0x40) + 1] = "32BIT", [ilog2(0x01) + 1] = "SHARED", [ilog2(0x02) + 1] = "PRIVATE", [ilog2(0x10) + 1] = "FIXED", [ilog2(0x20) + 1] = "ANONYMOUS", [ilog2(0x100000) + 1] = "FIXED_NOREPLACE", [ilog2(0x0100) + 1] = "GROWSDOWN", [ilog2(0x0800) + 1] = "DENYWRITE", [ilog2(0x1000) + 1] = "EXECUTABLE", [ilog2(0x2000) + 1] = "LOCKED", [ilog2(0x4000) + 1] = "NORESERVE", [ilog2(0x8000) + 1] = "POPULATE", [ilog2(0x10000) + 1] = "NONBLOCK", [ilog2(0x20000) + 1] = "STACK", [ilog2(0x40000) + 1] = "HUGETLB", [ilog2(0x80000) + 1] = "SYNC", }; $ And to have the system's sys/mman.h find the definition of MAP_SHARED and MAP_PRIVATE, make sure they are defined in the tools/ mman-common.h in a way that keeps it the same as the kernel's, need for keeping the Android's NDK cross build working. This silences these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h' diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-h80ycpc6pedg9s5z2rwpy6ws@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-17tools/headers: Synchronize kernel ABI headers, v4.17-rc1Ingo Molnar
Sync the following tooling headers with the latest kernel version: tools/arch/arm/include/uapi/asm/kvm.h - New ABI: KVM_REG_ARM_* tools/arch/x86/include/asm/required-features.h - Removal of NEED_LA57 dependency tools/arch/x86/include/uapi/asm/kvm.h - New KVM ABI: KVM_SYNC_X86_* tools/include/uapi/asm-generic/mman-common.h - New ABI: MAP_FIXED_NOREPLACE flag tools/include/uapi/linux/bpf.h - New ABI: BPF_F_SEQ_NUMBER functions tools/include/uapi/linux/if_link.h - New ABI: IFLA tun and rmnet support tools/include/uapi/linux/kvm.h - New ABI: hyperv eventfd and CONN_ID_MASK support plus header cleanups tools/include/uapi/sound/asound.h - New ABI: SNDRV_PCM_FORMAT_FIRST PCM format specifier tools/perf/arch/x86/entry/syscalls/syscall_64.tbl - The x86 system call table description changed due to the ptregs changes and the renames, in: d5a00528b58c: syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*() 5ac9efa3c50d: syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention ebeb8c82ffaf: syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 Also fix the x86 syscall table warning: -Warning: Kernel ABI header at 'tools/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' +Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' None of these changes impact existing tooling code, so we only have to copy the kernel version. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Robbins <brianrob@microsoft.com> Cc: Clark Williams <williams@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zhijian <lizhijian@cn.fujitsu.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Takuya Yamamoto <tkydevel@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: William Cohen <wcohen@redhat.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/20180416064024.ofjtrz5yuu3ykhvl@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-17Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-04tools/headers: Synchronize kernel ABI headersIngo Molnar
After the SPDX license tags were added a number of tooling headers got out of sync with their kernel variants, generating lots of build warnings. Sync them: - tools/arch/x86/include/asm/disabled-features.h, tools/arch/x86/include/asm/required-features.h, tools/include/linux/hash.h: Remove the SPDX tag where the kernel version does not have it. - tools/include/asm-generic/bitops/__fls.h, tools/include/asm-generic/bitops/arch_hweight.h, tools/include/asm-generic/bitops/const_hweight.h, tools/include/asm-generic/bitops/fls.h, tools/include/asm-generic/bitops/fls64.h, tools/include/uapi/asm-generic/ioctls.h, tools/include/uapi/asm-generic/mman-common.h, tools/include/uapi/sound/asound.h, tools/include/uapi/linux/kvm.h, tools/include/uapi/linux/perf_event.h, tools/include/uapi/linux/sched.h, tools/include/uapi/linux/vhost.h, tools/include/uapi/sound/asound.h: Add the SPDX tag of the respective kernel header. - tools/include/uapi/linux/bpf_common.h, tools/include/uapi/linux/fcntl.h, tools/include/uapi/linux/hw_breakpoint.h, tools/include/uapi/linux/mman.h, tools/include/uapi/linux/stat.h, Change the tag to the kernel header version: -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ Also sync other header details: - include/uapi/sound/asound.h: Fix pointless end of line whitespace noise the header grew in this cycle. - tools/arch/x86/lib/memcpy_64.S: Sync the code and add tools/include/asm/export.h with dummy wrappers to support building the kernel side code in a tooling header environment. - tools/include/uapi/asm-generic/mman.h, tools/include/uapi/linux/bpf.h: Sync other details that don't impact tooling's use of the ABIs. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-03mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flagsDan Williams
The mmap(2) syscall suffers from the ABI anti-pattern of not validating unknown flags. However, proposals like MAP_SYNC need a mechanism to define new behavior that is known to fail on older kernels without the support. Define a new MAP_SHARED_VALIDATE flag pattern that is guaranteed to fail on all legacy mmap implementations. It is worth noting that the original proposal was for a standalone MAP_VALIDATE flag. However, when that could not be supported by all archs Linus observed: I see why you *think* you want a bitmap. You think you want a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC etc, so that people can do ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_SYNC, fd, 0); and "know" that MAP_SYNC actually takes. And I'm saying that whole wish is bogus. You're fundamentally depending on special semantics, just make it explicit. It's already not portable, so don't try to make it so. Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value of 0x3, and make people do ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0); and then the kernel side is easier too (none of that random garbage playing games with looking at the "MAP_VALIDATE bit", but just another case statement in that map type thing. Boom. Done. Similar to ->fallocate() we also want the ability to validate the support for new flags on a per ->mmap() 'struct file_operations' instance basis. Towards that end arrange for flags to be generically validated against a mmap_supported_flags exported by 'struct file_operations'. By default all existing flags are implicitly supported, but new flags require MAP_SHARED_VALIDATE and per-instance-opt-in. Cc: Jan Kara <jack@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-09-25tools include: Sync kernel ABI headers with tooling headersIngo Molnar
Time for a sync with ABI/uapi headers with the upcoming v4.14 kernel. None of the ABI changes require any source code level changes to our existing in-kernel tooling code: - tools/arch/s390/include/uapi/asm/kvm.h: New KVM_S390_VM_TOD_EXT ABI, not used by in-kernel tooling. - tools/arch/x86/include/asm/cpufeatures.h: tools/arch/x86/include/asm/disabled-features.h: New PCID, SME and VGIF x86 CPU feature bits defined. - tools/include/asm-generic/hugetlb_encode.h: tools/include/uapi/asm-generic/mman-common.h: tools/include/uapi/linux/mman.h: Two new madvise() flags, plus a hugetlb system call mmap flags restructuring/extension changes. - tools/include/uapi/drm/drm.h: tools/include/uapi/drm/i915_drm.h: New drm_syncobj_create flags definitions, new drm_syncobj_wait and drm_syncobj_array ABIs. DRM_I915_PERF_* calls and a new I915_PARAM_HAS_EXEC_FENCE_ARRAY ABI for the Intel driver. - tools/include/uapi/linux/bpf.h: New bpf_sock fields (::mark and ::priority), new XDP_REDIRECT action, new kvm_ppc_smmu_info fields (::data_keys, instr_keys) Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20170913073823.lxmi4c7ejqlfabjx@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-28tools: Update asm-generic/mman-common.h copy from the kernelArnaldo Carvalho de Melo
To get the defines introduced in the commit e8c24d3a23a4 ("x86/pkeys: Allocation/free syscalls") Silencing this perf build warning: Warning: tools/include/uapi/asm-generic/mman-common.h differs from kernel Need to change 'perf trace' to beautify those syscalls, as soon as booting with a kernel with it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-yev9rexu02cl7cjeozzmrl9t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13tools include: Add uapi mman.h for each architectureWang Nan
Some mmap related macros have different values for different architectures. This patch introduces uapi mman.h for each architectures. Three headers are cloned from kernel include to tools/include: tools/include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman.h tools/include/uapi/linux/mman.h The main part of this patch is generated by following script: macros=`cat $0 | awk 'V==1 {print}; /^# start macro list/ {V=1}'` for arch in `ls tools/arch` do [ -d tools/arch/$arch/include/uapi/asm ] || mkdir -p tools/arch/$arch/include/uapi/asm src=arch/$arch/include/uapi/asm/mman.h target=tools/arch/$arch/include/uapi/asm/mman.h guard="TOOLS_ARCH_"`echo $arch | awk '{print toupper($0)}'`_UAPI_ASM_MMAN_FIX_H echo '#ifndef '$guard > $target echo '#define '$guard >> $target [ -f $src ] && for m in $macros do if grep '#define[ \t]*'$m $src > /dev/null 2>&1 then grep -h '#define[ \t]*'$m $src | sed 's/[ \t]*\/\*.*$//g' >> $target fi done if [ -f $src ] then grep '#include <asm-generic' $src >> $target else echo "#include <asm-generic/mman.h>" >> $target fi echo '#endif' >> $target echo "$target" done exit 0 # Following macros are extracted from: # tools/perf/trace/beauty/mmap.c # # start macro list MADV_DODUMP MADV_DOFORK MADV_DONTDUMP MADV_DONTFORK MADV_DONTNEED MADV_HUGEPAGE MADV_HWPOISON MADV_MERGEABLE MADV_NOHUGEPAGE MADV_NORMAL MADV_RANDOM MADV_REMOVE MADV_SEQUENTIAL MADV_SOFT_OFFLINE MADV_UNMERGEABLE MADV_WILLNEED MAP_32BIT MAP_ANONYMOUS MAP_DENYWRITE MAP_EXECUTABLE MAP_FILE MAP_FIXED MAP_GROWSDOWN MAP_HUGETLB MAP_LOCKED MAP_NONBLOCK MAP_NORESERVE MAP_POPULATE MAP_PRIVATE MAP_SHARED MAP_STACK MAP_UNINITIALIZED MREMAP_FIXED MREMAP_MAYMOVE PROT_EXEC PROT_GROWSDOWN PROT_GROWSUP PROT_NONE PROT_READ PROT_SEM PROT_WRITE Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1473684871-209320-2-git-send-email-wangnan0@huawei.com [ Added new files to tools/perf/MANIFEST to fix the detached tarball build, add mman.h for ARC ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>