summaryrefslogtreecommitdiff
path: root/Documentation/trace/events-kmem.rst
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2018-04-03 13:35:51 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2018-04-03 13:35:51 -0700
commitbb2407a7219760926760f0448fddf00d625e5aec (patch)
tree68d2b7a17f0bb837d04d658a56baf16dd261210f /Documentation/trace/events-kmem.rst
parente40dc66220b7ff1b816311b135b9298f8ba14ce6 (diff)
parent86afad7d87f535ebb1a0e978bc32a8c58ac99268 (diff)
Merge tag 'docs-4.17' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet: "There's been a fair amount of activity in Documentation/ this time around: - Lots of work aligning Documentation/ABI with reality, done by Aishwarya Pant. - The trace documentation has been converted to RST by Changbin Du - I thrashed up kernel-doc to deal with a parsing issue and to try to make the code more readable. It's still a 20+-year-old Perl hack, though. - Lots of other updates, typo fixes, and more" * tag 'docs-4.17' of git://git.lwn.net/linux: (82 commits) Documentation/process: update FUSE project website docs: kernel-doc: fix parsing of arrays dmaengine: Fix spelling for parenthesis in dmatest documentation dmaengine: Make dmatest.rst indeed reST compatible dmaengine: Add note to dmatest documentation about supported channels Documentation: magic-numbers: Fix typo Documentation: admin-guide: add kvmconfig, xenconfig and tinyconfig commands Input: alps - Update documentation for trackstick v3 format Documentation: Mention why %p prints ptrval COPYING: use the new text with points to the license files COPYING: create a new file with points to the Kernel license files Input: trackpoint: document sysfs interface xfs: Change URL for the project in xfs.txt char/bsr: add sysfs interface documentation acpi: nfit: document sysfs interface block: rbd: update sysfs interface Documentation/sparse: fix typo Documentation/CodingStyle: Add an example for braces docs/vm: update 00-INDEX kernel-doc: Remove __sched markings ...
Diffstat (limited to 'Documentation/trace/events-kmem.rst')
-rw-r--r--Documentation/trace/events-kmem.rst119
1 files changed, 119 insertions, 0 deletions
diff --git a/Documentation/trace/events-kmem.rst b/Documentation/trace/events-kmem.rst
new file mode 100644
index 000000000000..555484110e36
--- /dev/null
+++ b/Documentation/trace/events-kmem.rst
@@ -0,0 +1,119 @@
+============================
+Subsystem Trace Points: kmem
+============================
+
+The kmem tracing system captures events related to object and page allocation
+within the kernel. Broadly speaking there are five major subheadings.
+
+ - Slab allocation of small objects of unknown type (kmalloc)
+ - Slab allocation of small objects of known type
+ - Page allocation
+ - Per-CPU Allocator Activity
+ - External Fragmentation
+
+This document describes what each of the tracepoints is and why they
+might be useful.
+
+1. Slab allocation of small objects of unknown type
+===================================================
+::
+
+ kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
+ kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
+ kfree call_site=%lx ptr=%p
+
+Heavy activity for these events may indicate that a specific cache is
+justified, particularly if kmalloc slab pages are getting significantly
+internal fragmented as a result of the allocation pattern. By correlating
+kmalloc with kfree, it may be possible to identify memory leaks and where
+the allocation sites were.
+
+
+2. Slab allocation of small objects of known type
+=================================================
+::
+
+ kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
+ kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
+ kmem_cache_free call_site=%lx ptr=%p
+
+These events are similar in usage to the kmalloc-related events except that
+it is likely easier to pin the event down to a specific cache. At the time
+of writing, no information is available on what slab is being allocated from,
+but the call_site can usually be used to extrapolate that information.
+
+3. Page allocation
+==================
+::
+
+ mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
+ mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
+ mm_page_free page=%p pfn=%lu order=%d
+ mm_page_free_batched page=%p pfn=%lu order=%d cold=%d
+
+These four events deal with page allocation and freeing. mm_page_alloc is
+a simple indicator of page allocator activity. Pages may be allocated from
+the per-CPU allocator (high performance) or the buddy allocator.
+
+If pages are allocated directly from the buddy allocator, the
+mm_page_alloc_zone_locked event is triggered. This event is important as high
+amounts of activity imply high activity on the zone->lock. Taking this lock
+impairs performance by disabling interrupts, dirtying cache lines between
+CPUs and serialising many CPUs.
+
+When a page is freed directly by the caller, the only mm_page_free event
+is triggered. Significant amounts of activity here could indicate that the
+callers should be batching their activities.
+
+When pages are freed in batch, the also mm_page_free_batched is triggered.
+Broadly speaking, pages are taken off the LRU lock in bulk and
+freed in batch with a page list. Significant amounts of activity here could
+indicate that the system is under memory pressure and can also indicate
+contention on the zone->lru_lock.
+
+4. Per-CPU Allocator Activity
+=============================
+::
+
+ mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
+ mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d
+
+In front of the page allocator is a per-cpu page allocator. It exists only
+for order-0 pages, reduces contention on the zone->lock and reduces the
+amount of writing on struct page.
+
+When a per-CPU list is empty or pages of the wrong type are allocated,
+the zone->lock will be taken once and the per-CPU list refilled. The event
+triggered is mm_page_alloc_zone_locked for each page allocated with the
+event indicating whether it is for a percpu_refill or not.
+
+When the per-CPU list is too full, a number of pages are freed, each one
+which triggers a mm_page_pcpu_drain event.
+
+The individual nature of the events is so that pages can be tracked
+between allocation and freeing. A number of drain or refill pages that occur
+consecutively imply the zone->lock being taken once. Large amounts of per-CPU
+refills and drains could imply an imbalance between CPUs where too much work
+is being concentrated in one place. It could also indicate that the per-CPU
+lists should be a larger size. Finally, large amounts of refills on one CPU
+and drains on another could be a factor in causing large amounts of cache
+line bounces due to writes between CPUs and worth investigating if pages
+can be allocated and freed on the same CPU through some algorithm change.
+
+5. External Fragmentation
+=========================
+::
+
+ mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
+
+External fragmentation affects whether a high-order allocation will be
+successful or not. For some types of hardware, this is important although
+it is avoided where possible. If the system is using huge pages and needs
+to be able to resize the pool over the lifetime of the system, this value
+is important.
+
+Large numbers of this event implies that memory is fragmenting and
+high-order allocations will start failing at some time in the future. One
+means of reducing the occurrence of this event is to increase the size of
+min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
+pageblock_size is usually the size of the default hugepage size.