summaryrefslogtreecommitdiff
path: root/mm/slab.c
AgeCommit message (Collapse)Author
2016-03-15mm: new API kfree_bulk() for SLAB+SLUB allocatorsJesper Dangaard Brouer
This patch introduce a new API call kfree_bulk() for bulk freeing memory objects not bound to a single kmem_cache. Christoph pointed out that it is possible to implement freeing of objects, without knowing the kmem_cache pointer as that information is available from the object's page->slab_cache. Proposing to remove the kmem_cache argument from the bulk free API. Jesper demonstrated that these extra steps per object comes at a performance cost. It is only in the case CONFIG_MEMCG_KMEM is compiled in and activated runtime that these steps are done anyhow. The extra cost is most visible for SLAB allocator, because the SLUB allocator does the page lookup (virt_to_head_page()) anyhow. Thus, the conclusion was to keep the kmem_cache free bulk API with a kmem_cache pointer, but we can still implement a kfree_bulk() API fairly easily. Simply by handling if kmem_cache_free_bulk() gets called with a kmem_cache NULL pointer. This does increase the code size a bit, but implementing a separate kfree_bulk() call would likely increase code size even more. Below benchmarks cost of alloc+free (obj size 256 bytes) on CPU i7-4790K @ 4.00GHz, no PREEMPT and CONFIG_MEMCG_KMEM=y. Code size increase for SLAB: add/remove: 0/0 grow/shrink: 1/0 up/down: 74/0 (74) function old new delta kmem_cache_free_bulk 660 734 +74 SLAB fastpath: 87 cycles(tsc) 21.814 sz - fallback - kmem_cache_free_bulk - kfree_bulk 1 - 103 cycles 25.878 ns - 41 cycles 10.498 ns - 81 cycles 20.312 ns 2 - 94 cycles 23.673 ns - 26 cycles 6.682 ns - 42 cycles 10.649 ns 3 - 92 cycles 23.181 ns - 21 cycles 5.325 ns - 39 cycles 9.950 ns 4 - 90 cycles 22.727 ns - 18 cycles 4.673 ns - 26 cycles 6.693 ns 8 - 89 cycles 22.270 ns - 14 cycles 3.664 ns - 23 cycles 5.835 ns 16 - 88 cycles 22.038 ns - 14 cycles 3.503 ns - 22 cycles 5.543 ns 30 - 89 cycles 22.284 ns - 13 cycles 3.310 ns - 20 cycles 5.197 ns 32 - 88 cycles 22.249 ns - 13 cycles 3.420 ns - 20 cycles 5.166 ns 34 - 88 cycles 22.224 ns - 14 cycles 3.643 ns - 20 cycles 5.170 ns 48 - 88 cycles 22.088 ns - 14 cycles 3.507 ns - 20 cycles 5.203 ns 64 - 88 cycles 22.063 ns - 13 cycles 3.428 ns - 20 cycles 5.152 ns 128 - 89 cycles 22.483 ns - 15 cycles 3.891 ns - 23 cycles 5.885 ns 158 - 89 cycles 22.381 ns - 15 cycles 3.779 ns - 22 cycles 5.548 ns 250 - 91 cycles 22.798 ns - 16 cycles 4.152 ns - 23 cycles 5.967 ns SLAB when enabling MEMCG_KMEM runtime: - kmemcg fastpath: 130 cycles(tsc) 32.684 ns (step:0) 1 - 148 cycles 37.220 ns - 66 cycles 16.622 ns - 66 cycles 16.583 ns 2 - 141 cycles 35.510 ns - 51 cycles 12.820 ns - 58 cycles 14.625 ns 3 - 140 cycles 35.017 ns - 37 cycles 9.326 ns - 33 cycles 8.474 ns 4 - 137 cycles 34.507 ns - 31 cycles 7.888 ns - 33 cycles 8.300 ns 8 - 140 cycles 35.069 ns - 25 cycles 6.461 ns - 25 cycles 6.436 ns 16 - 138 cycles 34.542 ns - 23 cycles 5.945 ns - 22 cycles 5.670 ns 30 - 136 cycles 34.227 ns - 22 cycles 5.502 ns - 22 cycles 5.587 ns 32 - 136 cycles 34.253 ns - 21 cycles 5.475 ns - 21 cycles 5.324 ns 34 - 136 cycles 34.254 ns - 21 cycles 5.448 ns - 20 cycles 5.194 ns 48 - 136 cycles 34.075 ns - 21 cycles 5.458 ns - 21 cycles 5.367 ns 64 - 135 cycles 33.994 ns - 21 cycles 5.350 ns - 21 cycles 5.259 ns 128 - 137 cycles 34.446 ns - 23 cycles 5.816 ns - 22 cycles 5.688 ns 158 - 137 cycles 34.379 ns - 22 cycles 5.727 ns - 22 cycles 5.602 ns 250 - 138 cycles 34.755 ns - 24 cycles 6.093 ns - 23 cycles 5.986 ns Code size increase for SLUB: function old new delta kmem_cache_free_bulk 717 799 +82 SLUB benchmark: SLUB fastpath: 46 cycles(tsc) 11.691 ns (step:0) sz - fallback - kmem_cache_free_bulk - kfree_bulk 1 - 61 cycles 15.486 ns - 53 cycles 13.364 ns - 57 cycles 14.464 ns 2 - 54 cycles 13.703 ns - 32 cycles 8.110 ns - 33 cycles 8.482 ns 3 - 53 cycles 13.272 ns - 25 cycles 6.362 ns - 27 cycles 6.947 ns 4 - 51 cycles 12.994 ns - 24 cycles 6.087 ns - 24 cycles 6.078 ns 8 - 50 cycles 12.576 ns - 21 cycles 5.354 ns - 22 cycles 5.513 ns 16 - 49 cycles 12.368 ns - 20 cycles 5.054 ns - 20 cycles 5.042 ns 30 - 49 cycles 12.273 ns - 18 cycles 4.748 ns - 19 cycles 4.758 ns 32 - 49 cycles 12.401 ns - 19 cycles 4.821 ns - 19 cycles 4.810 ns 34 - 98 cycles 24.519 ns - 24 cycles 6.154 ns - 24 cycles 6.157 ns 48 - 83 cycles 20.833 ns - 21 cycles 5.446 ns - 21 cycles 5.429 ns 64 - 75 cycles 18.891 ns - 20 cycles 5.247 ns - 20 cycles 5.238 ns 128 - 93 cycles 23.271 ns - 27 cycles 6.856 ns - 27 cycles 6.823 ns 158 - 102 cycles 25.581 ns - 30 cycles 7.714 ns - 30 cycles 7.695 ns 250 - 107 cycles 26.917 ns - 38 cycles 9.514 ns - 38 cycles 9.506 ns SLUB when enabling MEMCG_KMEM runtime: - kmemcg fastpath: 71 cycles(tsc) 17.897 ns (step:0) 1 - 85 cycles 21.484 ns - 78 cycles 19.569 ns - 75 cycles 18.938 ns 2 - 81 cycles 20.363 ns - 45 cycles 11.258 ns - 44 cycles 11.076 ns 3 - 78 cycles 19.709 ns - 33 cycles 8.354 ns - 32 cycles 8.044 ns 4 - 77 cycles 19.430 ns - 28 cycles 7.216 ns - 28 cycles 7.003 ns 8 - 101 cycles 25.288 ns - 23 cycles 5.849 ns - 23 cycles 5.787 ns 16 - 76 cycles 19.148 ns - 20 cycles 5.162 ns - 20 cycles 5.081 ns 30 - 76 cycles 19.067 ns - 19 cycles 4.868 ns - 19 cycles 4.821 ns 32 - 76 cycles 19.052 ns - 19 cycles 4.857 ns - 19 cycles 4.815 ns 34 - 121 cycles 30.291 ns - 25 cycles 6.333 ns - 25 cycles 6.268 ns 48 - 108 cycles 27.111 ns - 21 cycles 5.498 ns - 21 cycles 5.458 ns 64 - 100 cycles 25.164 ns - 20 cycles 5.242 ns - 20 cycles 5.229 ns 128 - 155 cycles 38.976 ns - 27 cycles 6.886 ns - 27 cycles 6.892 ns 158 - 132 cycles 33.034 ns - 30 cycles 7.711 ns - 30 cycles 7.728 ns 250 - 130 cycles 32.612 ns - 38 cycles 9.560 ns - 38 cycles 9.549 ns Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15slab: implement bulk free in SLAB allocatorJesper Dangaard Brouer
This patch implements the free side of bulk API for the SLAB allocator kmem_cache_free_bulk(), and concludes the implementation of optimized bulk API for SLAB allocator. Benchmarked[1] cost of alloc+free (obj size 256 bytes) on CPU i7-4790K @ 4.00GHz, with no debug options, no PREEMPT and CONFIG_MEMCG_KMEM=y but no active user of kmemcg. SLAB single alloc+free cost: 87 cycles(tsc) 21.814 ns with this optimized config. bulk- Current fallback - optimized SLAB bulk 1 - 102 cycles(tsc) 25.747 ns - 41 cycles(tsc) 10.490 ns - improved 59.8% 2 - 94 cycles(tsc) 23.546 ns - 26 cycles(tsc) 6.567 ns - improved 72.3% 3 - 92 cycles(tsc) 23.127 ns - 20 cycles(tsc) 5.244 ns - improved 78.3% 4 - 90 cycles(tsc) 22.663 ns - 18 cycles(tsc) 4.588 ns - improved 80.0% 8 - 88 cycles(tsc) 22.242 ns - 14 cycles(tsc) 3.656 ns - improved 84.1% 16 - 88 cycles(tsc) 22.010 ns - 13 cycles(tsc) 3.480 ns - improved 85.2% 30 - 89 cycles(tsc) 22.305 ns - 13 cycles(tsc) 3.303 ns - improved 85.4% 32 - 89 cycles(tsc) 22.277 ns - 13 cycles(tsc) 3.309 ns - improved 85.4% 34 - 88 cycles(tsc) 22.246 ns - 13 cycles(tsc) 3.294 ns - improved 85.2% 48 - 88 cycles(tsc) 22.121 ns - 13 cycles(tsc) 3.492 ns - improved 85.2% 64 - 88 cycles(tsc) 22.052 ns - 13 cycles(tsc) 3.411 ns - improved 85.2% 128 - 89 cycles(tsc) 22.452 ns - 15 cycles(tsc) 3.841 ns - improved 83.1% 158 - 89 cycles(tsc) 22.403 ns - 14 cycles(tsc) 3.746 ns - improved 84.3% 250 - 91 cycles(tsc) 22.775 ns - 16 cycles(tsc) 4.111 ns - improved 82.4% Notice it is not recommended to do very large bulk operation with this bulk API, because local IRQs are disabled in this period. [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15slab: avoid running debug SLAB code with IRQs disabled for alloc_bulkJesper Dangaard Brouer
Move the call to cache_alloc_debugcheck_after() outside the IRQ disabled section in kmem_cache_alloc_bulk(). When CONFIG_DEBUG_SLAB is disabled the compiler should remove this code. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15slab: implement bulk alloc in SLAB allocatorJesper Dangaard Brouer
This patch implements the alloc side of bulk API for the SLAB allocator. Further optimization are still possible by changing the call to __do_cache_alloc() into something that can return multiple objects. This optimization is left for later, given end results already show in the area of 80% speedup. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15slab: use slab_post_alloc_hook in SLAB allocator shared with SLUBJesper Dangaard Brouer
Reviewers notice that the order in slab_post_alloc_hook() of kmemcheck_slab_alloc() and kmemleak_alloc_recursive() gets swapped compared to slab.c / SLAB allocator. Also notice memset now occurs before calling kmemcheck_slab_alloc() and kmemleak_alloc_recursive(). I assume this reordering of kmemcheck, kmemleak and memset is okay because this is the order they are used by the SLUB allocator. This patch completes the sharing of alloc_hook's between SLUB and SLAB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15slab: use slab_pre_alloc_hook in SLAB allocator shared with SLUBJesper Dangaard Brouer
Deduplicate code in SLAB allocator functions slab_alloc() and slab_alloc_node() by using the slab_pre_alloc_hook() call, which is now shared between SLUB and SLAB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15mm: fault-inject take over bootstrap kmem_cache checkJesper Dangaard Brouer
Remove the SLAB specific function slab_should_failslab(), by moving the check against fault-injection for the bootstrap slab, into the shared function should_failslab() (used by both SLAB and SLUB). This is a step towards sharing alloc_hook's between SLUB and SLAB. This bootstrap slab "kmem_cache" is used for allocating struct kmem_cache objects to the allocator itself. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18mm: slab: free kmem_cache_node after destroy sysfs fileDmitry Safonov
When slub_debug alloc_calls_show is enabled we will try to track location and user of slab object on each online node, kmem_cache_node structure and cpu_cache/cpu_slub shouldn't be freed till there is the last reference to sysfs file. This fixes the following panic: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: list_locations+0x169/0x4e0 PGD 257304067 PUD 438456067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30 Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011 task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000 RIP: list_locations+0x169/0x4e0 Call Trace: alloc_calls_show+0x1d/0x30 slab_attr_show+0x1b/0x30 sysfs_read_file+0x9a/0x1a0 vfs_read+0x9c/0x170 SyS_read+0x58/0xb0 system_call_fastpath+0x16/0x1b Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10 CR2: 0000000000000020 Separated __kmem_cache_release from __kmem_cache_shutdown which now called on slab_kmem_cache_release (after the last reference to sysfs file object has dropped). Reintroduced locking in free_partial as sysfs file might access cache's partial list after shutdowning - partial revert of the commit 69cb8e6b7c29 ("slub: free slabs without holding locks"). Zap __remove_partial and use remove_partial (w/o underscores) as free_partial now takes list_lock which s partial revert for commit 1e4dd9461fab ("slub: do not assert not having lock in removing freed partial") Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm/slab.c: add a helper function get_first_slabGeliang Tang
Add a new helper function get_first_slab() that get the first slab from a kmem_cache_node. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm/slab.c: use list_for_each_entry in cache_flusharrayGeliang Tang
Simplify the code with list_for_each_entry(). Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm/slab.c use list_first_entry_or_null()Geliang Tang
Simplify the code with list_first_entry_or_null(). Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-22slab/slub: adjust kmem_cache_alloc_bulk APIJesper Dangaard Brouer
Adjust kmem_cache_alloc_bulk API before we have any real users. Adjust API to return type 'int' instead of previously type 'bool'. This is done to allow future extension of the bulk alloc API. A future extension could be to allow SLUB to stop at a page boundary, when specified by a flag, and then return the number of objects. The advantage of this approach, would make it easier to make bulk alloc run without local IRQs disabled. With an approach of cmpxchg "stealing" the entire c->freelist or page->freelist. To avoid overshooting we would stop processing at a slab-page boundary. Else we always end up returning some objects at the cost of another cmpxchg. To keep compatible with future users of this API linking against an older kernel when using the new flag, we need to return the number of allocated objects with this API change. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06slab, slub: use page->rcu_head instead of page->lru plus castKirill A. Shutemov
We have properly typed page->rcu_head, no need to cast page->lru. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman
sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05memcg: unify slab and other kmem pages chargingVladimir Davydov
We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and uncharging kmem pages to memcg, but currently they are not used for charging slab pages (i.e. they are only used for charging pages allocated with alloc_kmem_pages). The only reason why the slab subsystem uses special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it needs to charge to the memcg of kmem cache while memcg_charge_kmem charges to the memcg that the current task belongs to. To remove this diversity, this patch adds an extra argument to __memcg_kmem_charge that can be a pointer to a memcg or NULL. If it is not NULL, the function tries to charge to the memcg it points to, otherwise it charge to the current context. Next, it makes the slab subsystem use this function to charge slab pages. Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined. Since __memcg_kmem_charge stores a pointer to the memcg in the page struct, we don't need memcg_uncharge_slab anymore and can use free_kmem_pages. Besides, one can now detect which memcg a slab page belongs to by reading /proc/kpagecgroup. Note, this patch switches slab to charge-after-alloc design. Since this design is already used for all other memcg charges, it should not make any difference. [hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm: slab: only move management objects off-slab for sizes larger than ↵Catalin Marinas
KMALLOC_MIN_SIZE On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc configurations defining ARCH_DMA_MINALIGN to 128), the first kmalloc_caches[] entry to be initialised after slab_early_init = 0 is "kmalloc-128" with index 7. Depending on the debug kernel configuration, sizeof(struct kmem_cache) can be larger than 128 resulting in an INDEX_NODE of 8. Commit 8fc9cf420b36 ("slab: make more slab management structure off the slab") enables off-slab management objects for sizes starting with PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation of the "kmalloc-128" cache would try to place the management objects off-slab. However, since KMALLOC_MIN_SIZE is already 128 and freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size) returns NULL (kmalloc_caches[7] not populated yet). This triggers the following bug on arm64: kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540 Hardware name: Juno (DT) PC is at __kmem_cache_create+0x21c/0x280 LR is at __kmem_cache_create+0x210/0x280 [...] Call trace: __kmem_cache_create+0x21c/0x280 create_boot_cache+0x48/0x80 create_kmalloc_cache+0x50/0x88 create_kmalloc_caches+0x4c/0xf4 kmem_cache_init+0x100/0x118 start_kernel+0x214/0x33c This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE. Fixes: 8fc9cf420b36 ("slab: make more slab management structure off the slab") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)Joonsoo Kim
Commit description is copied from the original post of this bug: http://comments.gmane.org/gmane.linux.kernel.mm/135349 Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next larger cache size than the size index INDEX_NODE mapping. In kernels 3.9 and earlier we used malloc_sizes[INDEX_L3 + 1].cs_size. However, sometimes we can't get the right output we expected via kmalloc_size(INDEX_NODE + 1), causing a BUG(). The mapping table in the latest kernel is like: index = {0, 1, 2 , 3, 4, 5, 6, n} size = {0, 96, 192, 8, 16, 32, 64, 2^n} The mapping table before 3.10 is like this: index = {0 , 1 , 2, 3, 4 , 5 , 6, n} size = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)} The problem on my mips64 machine is as follows: (1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC && DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150", and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node)) (2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8. (3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size = PAGE_SIZE". (4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and "flags |= CFLGS_OFF_SLAB" will be covered. (5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to "cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result here may be NULL while kernel bootup. (6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the BUG info as the following shows (may be only mips64 has this problem): This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) and removes the BUG by adding 'size >= 256' check to guarantee that all necessary small sized slabs are initialized regardless sequence of slab size in mapping table. Fixes: e33660165c90 ("slab: Use common kmalloc_index/kmalloc_size...") Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Liuhailong <liu.hailong6@zte.com.cn> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mm: rename alloc_pages_exact_node() to __alloc_pages_node()Vlastimil Babka
alloc_pages_exact_node() was introduced in commit 6484eb3e2a81 ("page allocator: do not check NUMA node ID when the caller knows the node is valid") as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see for example commits 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node") and b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node"). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with the exception of sba_alloc_coherent() which open-codes the check for NUMA_NO_NODE, so it is converted to use alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Robin Holt <robinmholt@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cliff Whickman <cpw@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04slab: infrastructure for bulk object allocation and freeingChristoph Lameter
Add the basic infrastructure for alloc/free operations on pointer arrays. It includes a generic function in the common slab code that is used in this infrastructure patch to create the unoptimized functionality for slab bulk operations. Allocators can then provide optimized allocation functions for situations in which large numbers of objects are needed. These optimization may avoid taking locks repeatedly and bypass metadata creation if all objects in slab pages can be used to provide the objects required. Allocators can extend the skeletons provided and add their own code to the bulk alloc and free functions. They can keep the generic allocation and freeing and just fall back to those if optimizations would not work (like for example when debugging is on). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-21mm: make page pfmemalloc check more robustMichal Hocko
Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page->mapping to NULL and leave page->index value alone. Due to being in union, a non-zero page->index will be interpreted as true page->pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page->pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page->index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). [akpm@linux-foundation.org: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.com> Debugged-by: Jiri Bohac <jbohac@suse.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24slab: correct size_index table before replacing the bootstrap kmem_cache_nodeDaniel Sanders
This patch moves the initialization of the size_index table slightly earlier so that the first few kmem_cache_node's can be safely allocated when KMALLOC_MIN_SIZE is large. There are currently two ways to generate indices into kmalloc_caches (via kmalloc_index() and via the size_index table in slab_common.c) and on some arches (possibly only MIPS) they potentially disagree with each other until create_kmalloc_caches() has been called. It seems that the intention is that the size_index table is a fast equivalent to kmalloc_index() and that create_kmalloc_caches() patches the table to return the correct value for the cases where kmalloc_index()'s if-statements apply. The failing sequence was: * kmalloc_caches contains NULL elements * kmem_cache_init initialises the element that 'struct kmem_cache_node' will be allocated to. For 32-bit Mips, this is a 56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7). * init_list is called which calls kmalloc_node to allocate a 'struct kmem_cache_node'. * kmalloc_slab selects the kmem_caches element using size_index[size_index_elem(size)]. For MIPS, size is 56, and the expression returns 6. * This element of kmalloc_caches is NULL and allocation fails. * If it had not already failed, it would have called create_kmalloc_caches() at this point which would have changed size_index[size_index_elem(size)] to 7. I don't believe the bug to be LLVM specific but GCC doesn't normally encounter the problem. I haven't been able to identify exactly what GCC is doing better (probably inlining) but it seems that GCC is managing to optimize to the point that it eliminates the problematic allocations. This theory is supported by the fact that GCC can be made to fail in the same way by changing inline, __inline, __inline__, and __always_inline in include/linux/compiler-gcc.h such that they don't actually inline things. Signed-off-by: Daniel Sanders <daniel.sanders@imgtec.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14mm: remove GFP_THISNODEDavid Rientjes
NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE. GFP_THISNODE is a secret combination of gfp bits that have different behavior than expected. It is a combination of __GFP_THISNODE, __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page allocator slowpath to fail without trying reclaim even though it may be used in combination with __GFP_WAIT. An example of the problem this creates: commit e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE that really just wanted __GFP_THISNODE. The problem doesn't end there, however, because even it was a no-op for alloc_misplaced_dst_page(), which also sets __GFP_NORETRY and __GFP_NOWARN, and migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a no-op in these cases since the page allocator special-cases __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN. It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE to restrict an allocation to a local node, but remove GFP_THISNODE and its obscurity. Instead, we require that a caller clear __GFP_WAIT if it wants to avoid reclaim. This allows the aforementioned functions to actually reclaim as they should. It also enables any future callers that want to do __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT. Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so it is unchanged. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Jarno Rajahalme <jrajahalme@nicira.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Greg Thelen <gthelen@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12slub: make dead caches discard free slabs immediatelyVladimir Davydov
To speed up further allocations SLUB may store empty slabs in per cpu/node partial lists instead of freeing them immediately. This prevents per memcg caches destruction, because kmem caches created for a memory cgroup are only destroyed after the last page charged to the cgroup is freed. To fix this issue, this patch resurrects approach first proposed in [1]. It forbids SLUB to cache empty slabs after the memory cgroup that the cache belongs to was destroyed. It is achieved by setting kmem_cache's cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so that it would drop frozen empty slabs immediately if cpu_partial = 0. The runtime overhead is minimal. From all the hot functions, we only touch relatively cold put_cpu_partial(): we make it call unfreeze_partials() after freezing a slab that belongs to an offline memory cgroup. Since slab freezing exists to avoid moving slabs from/to a partial list on free/alloc, and there can't be allocations from dead caches, it shouldn't cause any overhead. We do have to disable preemption for put_cpu_partial() to achieve that though. The original patch was accepted well and even merged to the mm tree. However, I decided to withdraw it due to changes happening to the memcg core at that time. I had an idea of introducing per-memcg shrinkers for kmem caches, but now, as memcg has finally settled down, I do not see it as an option, because SLUB shrinker would be too costly to call since SLUB does not keep free slabs on a separate list. Besides, we currently do not even call per-memcg shrinkers for offline memcgs. Overall, it would introduce much more complexity to both SLUB and memcg than this small patch. Regarding to SLAB, there's no problem with it, because it shrinks per-cpu/node caches periodically. Thanks to list_lru reparenting, we no longer keep entries for offline cgroups in per-memcg arrays (such as memcg_cache_params->memcg_caches), so we do not have to bother if a per-memcg cache will be shrunk a bit later than it could be. [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650 Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12slab: link memcg caches of the same kind into a listVladimir Davydov
Sometimes, we need to iterate over all memcg copies of a particular root kmem cache. Currently, we use memcg_cache_params->memcg_caches array for that, because it contains all existing memcg caches. However, it's a bad practice to keep all caches, including those that belong to offline cgroups, in this array, because it will be growing beyond any bounds then. I'm going to wipe away dead caches from it to save space. To still be able to perform iterations over all memcg caches of the same kind, let us link them into a list. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13slab: fix cpuset check in fallback_allocVladimir Davydov
fallback_alloc is called on kmalloc if the preferred node doesn't have free or partial slabs and there's no pages on the node's free list (GFP_THISNODE allocations fail). Before invoking the reclaimer it tries to locate a free or partial slab on other allowed nodes' lists. While iterating over the preferred node's zonelist it skips those zones which hardwall cpuset check returns false for. That means that for a task bound to a specific node using cpusets fallback_alloc will always ignore free slabs on other nodes and go directly to the reclaimer, which, however, may allocate from other nodes if cpuset.mem_hardwall is unset (default). As a result, we may get lists of free slabs grow without bounds on other nodes, which is bad, because inactive slabs are only evicted by cache_reap at a very slow rate and cannot be dropped forcefully. To reproduce the issue, run a process that will walk over a directory tree with lots of files inside a cpuset bound to a node that constantly experiences memory pressure. Look at num_slabs vs active_slabs growth as reported by /proc/slabinfo. To avoid this we should use softwall cpuset check in fallback_alloc. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Zefan Li <lizefan@huawei.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13memcg: fix possible use-after-free in memcg_kmem_get_cache()Vladimir Davydov
Suppose task @t that belongs to a memory cgroup @memcg is going to allocate an object from a kmem cache @c. The copy of @c corresponding to @memcg, @mc, is empty. Then if kmem_cache_alloc races with the memory cgroup destruction we can access the memory cgroup's copy of the cache after it was destroyed: CPU0 CPU1 ---- ---- [ current=@t @mc->memcg_params->nr_pages=0 ] kmem_cache_alloc(@c): call memcg_kmem_get_cache(@c); proceed to allocation from @mc: alloc a page for @mc: ... move @t from @memcg destroy @memcg: mem_cgroup_css_offline(@memcg): memcg_unregister_all_caches(@memcg): kmem_cache_destroy(@mc) add page to @mc We could fix this issue by taking a reference to a per-memcg cache, but that would require adding a per-cpu reference counter to per-memcg caches, which would look cumbersome. Instead, let's take a reference to a memory cgroup, which already has a per-cpu reference counter, in the beginning of kmem_cache_alloc to be dropped in the end, and move per memcg caches destruction from css offline to css free. As a side effect, per-memcg caches will be destroyed not one by one, but all at once when the last page accounted to the memory cgroup is freed. This doesn't sound as a high price for code readability though. Note, this patch does add some overhead to the kmem_cache_alloc hot path, but it is pretty negligible - it's just a function call plus a per cpu counter decrement, which is comparable to what we already have in memcg_kmem_get_cache. Besides, it's only relevant if there are memory cgroups with kmem accounting enabled. I don't think we can find a way to handle this race w/o it, because alloc_page called from kmem_cache_alloc may sleep so we can't flush all pending kmallocs w/o reference counting. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11Merge branch 'for-3.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup update from Tejun Heo: "cpuset got simplified a bit. cgroup core got a fix on unified hierarchy and grew some effective css related interfaces which will be used for blkio support for writeback IO traffic which is currently being worked on" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: implement cgroup_get_e_css() cgroup: add cgroup_subsys->css_e_css_changed() cgroup: add cgroup_subsys->css_released() cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() cpuset: lock vs unlock typo cpuset: simplify cpuset_node_allowed API cpuset: convert callback_mutex to a spinlock
2014-12-10slab: improve checking for invalid gfp_flagsAndrew Morton
The code goes BUG, but doesn't tell us which bits were unexpectedly set. Print that out. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10slab: print slabinfo header in seq showVladimir Davydov
Currently we print the slabinfo header in the seq start method, which makes it unusable for showing leaks, so we have leaks_show, which does practically the same as s_show except it doesn't show the header. However, we can print the header in the seq show method - we only need to check if the current element is the first on the list. This will allow us to use the same set of seq iterators for both leaks and slabinfo reporting, which is nice. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10mm: slab/slub: coding style: whitespaces and tabs mixtureLQYMGT
Some code in mm/slab.c and mm/slub.c use whitespaces in indent. Clean them up. Signed-off-by: LQYMGT <lqymgt@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-03slab: fix nodeid bounds check for non-contiguous node IDsPaul Mackerras
The bounds check for nodeid in ____cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when ____cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Call Trace: .____cache_alloc_node+0x5c/0x270 (unreliable) .kmem_cache_alloc_node_trace+0xdc/0x360 .init_list+0x3c/0x128 .kmem_cache_init+0x1dc/0x258 .start_kernel+0x2a0/0x568 start_here_common+0x20/0xa8 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Fixes: 14e50c6a9bc2 ("mm: slab: Verify the nodeid passed to ____cache_alloc_node") Signed-off-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-27cpuset: simplify cpuset_node_allowed APIVladimir Davydov
Current cpuset API for checking if a zone/node is allowed to allocate from looks rather awkward. We have hardwall and softwall versions of cpuset_node_allowed with the softwall version doing literally the same as the hardwall version if __GFP_HARDWALL is passed to it in gfp flags. If it isn't, the softwall version may check the given node against the enclosing hardwall cpuset, which it needs to take the callback lock to do. Such a distinction was introduced by commit 02a0e53d8227 ("cpuset: rework cpuset_zone_allowed api"). Before, we had the only version with the __GFP_HARDWALL flag determining its behavior. The purpose of the commit was to avoid sleep-in-atomic bugs when someone would mistakenly call the function without the __GFP_HARDWALL flag for an atomic allocation. The suffixes introduced were intended to make the callers think before using the function. However, since the callback lock was converted from mutex to spinlock by the previous patch, the softwall check function cannot sleep, and these precautions are no longer necessary. So let's simplify the API back to the single check. Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-10-14mm/slab: fix unaligned access on sparc64Joonsoo Kim
Commit bf0dea23a9c0 ("mm/slab: use percpu allocator for cpu cache") changed the allocation method for cpu cache array from slab allocator to percpu allocator. Alignment should be provided for aligned memory in percpu allocator case, but, that commit mistakenly set this alignment to 0. So, percpu allocator returns unaligned memory address. It doesn't cause any problem on x86 which permits unaligned access, but, it causes the problem on sparc64 which needs strong guarantee of alignment. Following bug report is reported from David Miller. I'm getting tons of the following on sparc64: [603965.383447] Kernel unaligned access at TPC[546b58] free_block+0x98/0x1a0 [603965.396987] Kernel unaligned access at TPC[546b60] free_block+0xa0/0x1a0 ... [603970.554394] log_unaligned: 333 callbacks suppressed ... This patch provides a proper alignment parameter when allocating cpu cache to fix this unaligned memory access problem on sparc64. Reported-by: David Miller <davem@davemloft.net> Tested-by: David Miller <davem@davemloft.net> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/slab.c: use __seq_open_private() instead of seq_open()Rob Jones
Using __seq_open_private() removes boilerplate code from slabstats_open() The resultant code is shorter and easier to follow. This patch does not change any functionality. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/slab: use percpu allocator for cpu cacheJoonsoo Kim
Because of chicken and egg problem, initialization of SLAB is really complicated. We need to allocate cpu cache through SLAB to make the kmem_cache work, but before initialization of kmem_cache, allocation through SLAB is impossible. On the other hand, SLUB does initialization in a more simple way. It uses percpu allocator to allocate cpu cache so there is no chicken and egg problem. So, this patch try to use percpu allocator in SLAB. This simplifies the initialization step in SLAB so that we could maintain SLAB code more easily. In my testing there is no performance difference. This implementation relies on percpu allocator. Because percpu allocator uses vmalloc address space, vmalloc address space could be exhausted by this change on many cpu system with *32 bit* kernel. This implementation can cover 1024 cpus in worst case by following calculation. Worst: 1024 cpus * 4 bytes for pointer * 300 kmem_caches * 120 objects per cpu_cache = 140 MB Normal: 1024 cpus * 4 bytes for pointer * 150 kmem_caches(slab merge) * 80 objects per cpu_cache = 46 MB Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/slab: support slab mergeJoonsoo Kim
Slab merge is good feature to reduce fragmentation. If new creating slab have similar size and property with exsitent slab, this feature reuse it rather than creating new one. As a result, objects are packed into fewer slabs so that fragmentation is reduced. Below is result of my testing. * After boot, sleep 20; cat /proc/meminfo | grep Slab <Before> Slab: 25136 kB <After> Slab: 24364 kB We can save 3% memory used by slab. For supporting this feature in SLAB, we need to implement SLAB specific kmem_cache_flag() and __kmem_cache_alias(), because SLUB implements some SLUB specific processing related to debug flag and object size change on these functions. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/slab: factor out unlikely part of cache_free_alien()Joonsoo Kim
cache_free_alien() is rarely used function when node mismatch. But, it is defined with inline attribute so it is inlined to __cache_free() which is core free function of slab allocator. It uselessly makes kmem_cache_free()/kfree() functions large. What we really need to inline is just checking node match so this patch factor out other parts of cache_free_alien() to reduce code size of kmem_cache_free()/ kfree(). <Before> nm -S mm/slab.o | grep -e "T kfree" -e "T kmem_cache_free" 00000000000011e0 0000000000000228 T kfree 0000000000000670 0000000000000216 T kmem_cache_free <After> nm -S mm/slab.o | grep -e "T kfree" -e "T kmem_cache_free" 0000000000001110 00000000000001b5 T kfree 0000000000000750 0000000000000181 T kmem_cache_free You can see slightly reduced size of text: 0x228->0x1b5, 0x216->0x181. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/slab: noinline __ac_put_obj()Joonsoo Kim
Our intention of __ac_put_obj() is that it doesn't affect anything if sk_memalloc_socks() is disabled. But, because __ac_put_obj() is too small, compiler inline it to ac_put_obj() and affect code size of free path. This patch add noinline keyword for __ac_put_obj() not to distrupt normal free path at all. <Before> nm -S slab-orig.o | grep -e "t cache_alloc_refill" -e "T kfree" -e "T kmem_cache_free" 0000000000001e80 00000000000002f5 t cache_alloc_refill 0000000000001230 0000000000000258 T kfree 0000000000000690 000000000000024c T kmem_cache_free <After> nm -S slab-patched.o | grep -e "t cache_alloc_refill" -e "T kfree" -e "T kmem_cache_free" 0000000000001e00 00000000000002e5 t cache_alloc_refill 00000000000011e0 0000000000000228 T kfree 0000000000000670 0000000000000216 T kmem_cache_free cache_alloc_refill: 0x2f5->0x2e5 kfree: 0x256->0x228 kmem_cache_free: 0x24c->0x216 code size of each function is reduced slightly. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/slab: move cache_flusharray() out of unlikely.text sectionJoonsoo Kim
Now, due to likely keyword, compiled code of cache_flusharray() is on unlikely.text section. Although it is uncommon case compared to free to cpu cache case, it is common case than free_block(). But, free_block() is on normal text section. This patch fix this odd situation to remove likely keyword. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller()Joonsoo Kim
Now, we track caller if tracing or slab debugging is enabled. If they are disabled, we could save one argument passing overhead by calling __kmalloc(_node)(). But, I think that it would be marginal. Furthermore, default slab allocator, SLUB, doesn't use this technique so I think that it's okay to change this situation. After this change, we can turn on/off CONFIG_DEBUG_SLAB without full kernel build and remove some complicated '#if' defintion. It looks more benefitial to me. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-27Merge branch 'for-3.17-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This is quite late but these need to be backported anyway. This is the fix for a long-standing cpuset bug which existed from 2009. cpuset makes use of PF_SPREAD_{PAGE|SLAB} flags to modify the task's memory allocation behavior according to the settings of the cpuset it belongs to; unfortunately, when those flags have to be changed, cpuset did so directly even whlie the target task is running, which is obviously racy as task->flags may be modified by the task itself at any time. This obscure bug manifested as corrupt PF_USED_MATH flag leading to a weird crash. The bug is fixed by moving the flag to task->atomic_flags. The first two are prepatory ones to help defining atomic_flags accessors and the third one is the actual fix" * 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags sched: add macros to define bitops for task atomic flags sched: fix confusing PFA_NO_NEW_PRIVS constant
2014-09-26mm, slab: initialize object alignment on cache creationDavid Rientjes
Since commit 4590685546a3 ("mm/sl[aou]b: Common alignment code"), the "ralign" automatic variable in __kmem_cache_create() may be used as uninitialized. The proper alignment defaults to BYTES_PER_WORD and can be overridden by SLAB_RED_ZONE or the alignment specified by the caller. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031 Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Andrei Elovikov <a.elovikov@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-24cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flagsZefan Li
When we change cpuset.memory_spread_{page,slab}, cpuset will flip PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset. This should be done using atomic bitops, but currently we don't, which is broken. Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened when one thread tried to clear PF_USED_MATH while at the same time another thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on the same task. Here's the full report: https://lkml.org/lkml/2014/9/19/230 To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags. v4: - updated mm/slab.c. (Fengguang Wu) - updated Documentation. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time") Cc: <stable@vger.kernel.org> # 2.6.31+ Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-08Revert "slab: remove BAD_ALIEN_MAGIC"Joonsoo Kim
This reverts commit a640616822b2 ("slab: remove BAD_ALIEN_MAGIC"). commit a640616822b2 ("slab: remove BAD_ALIEN_MAGIC") assumes that the system with !CONFIG_NUMA has only one memory node. But, it turns out to be false by the report from Geert. His system, m68k, has many memory nodes and is configured in !CONFIG_NUMA. So it couldn't boot with above change. Here goes his failure report. With latest mainline, I'm getting a crash during bootup on m68k/ARAnyM: enable_cpucache failed for radix_tree_node, error 12. kernel BUG at /scratch/geert/linux/linux-m68k/mm/slab.c:1522! *** TRAP #7 *** FORMAT=0 Current process id is 0 BAD KERNEL TRAP: 00000000 Modules linked in: PC: [<0039c92c>] kmem_cache_init_late+0x70/0x8c SR: 2200 SP: 00345f90 a2: 0034c2e8 d0: 0000003d d1: 00000000 d2: 00000000 d3: 003ac942 d4: 00000000 d5: 00000000 a0: 0034f686 a1: 0034f682 Process swapper (pid: 0, task=0034c2e8) Frame format=0 Stack from 00345fc4: 002f69ef 002ff7e5 000005f2 000360fa 0017d806 003921d4 00000000 00000000 00000000 00000000 00000000 00000000 003ac942 00000000 003912d6 Call Trace: [<000360fa>] parse_args+0x0/0x2ca [<0017d806>] strlen+0x0/0x1a [<003921d4>] start_kernel+0x23c/0x428 [<003912d6>] _sinittext+0x2d6/0x95e Code: f7e5 4879 002f 69ef 61ff ffca 462a 4e47 <4879> 0035 4b1c 61ff fff0 0cc4 7005 23c0 0037 fd20 588f 265f 285f 4e75 48e7 301c Disabling lock debugging due to kernel taint Kernel panic - not syncing: Attempted to kill the idle task! Although there is a alternative way to fix this issue such as disabling use of alien cache on !CONFIG_NUMA, but, reverting issued commit is better to me in this time. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06mm/slab.c: fix commentsWang Sheng-Hui
Current struct kmem_cache has no 'lock' field, and slab page is managed by struct kmem_cache_node, which has 'list_lock' field. Clean up the related comment. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06slab: change int to size_t for representing allocation sizeJoonsoo Kim
It is better to represent allocation size in size_t rather than int. So change it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06slab: remove BAD_ALIEN_MAGICJoonsoo Kim
BAD_ALIEN_MAGIC value isn't used anymore. So remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06slab: remove a useless lockdep annotationJoonsoo Kim
Now, there is no code to hold two lock simultaneously, since we don't call slab_destroy() with holding any lock. So, lockdep annotation is useless now. Remove it. v2: don't remove BAD_ALIEN_MAGIC in this patch. It will be removed in the following patch. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06slab: destroy a slab without holding any alien cache lockJoonsoo Kim
I haven't heard that this alien cache lock is contended, but to reduce chance of contention would be better generally. And with this change, we can simplify complex lockdep annotation in slab code. In the following patch, it will be implemented. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06slab: use the lock on alien_cache, instead of the lock on array_cacheJoonsoo Kim
Now, we have separate alien_cache structure, so it'd be better to hold the lock on alien_cache while manipulating alien_cache. After that, we don't need the lock on array_cache, so remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>