summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-06-04mm, compaction: add per-zone migration pfn cache for async compactionDavid Rientjes
Each zone has a cached migration scanner pfn for memory compaction so that subsequent calls to memory compaction can start where the previous call left off. Currently, the compaction migration scanner only updates the per-zone cached pfn when pageblocks were not skipped for async compaction. This creates a dependency on calling sync compaction to avoid having subsequent calls to async compaction from scanning an enormous amount of non-MOVABLE pageblocks each time it is called. On large machines, this could be potentially very expensive. This patch adds a per-zone cached migration scanner pfn only for async compaction. It is updated everytime a pageblock has been scanned in its entirety and when no pages from it were successfully isolated. The cached migration scanner pfn for sync compaction is updated only when called for sync compaction. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Greg Thelen <gthelen@google.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm, migration: add destination page freeing callbackDavid Rientjes
Memory migration uses a callback defined by the caller to determine how to allocate destination pages. When migration fails for a source page, however, it frees the destination page back to the system. This patch adds a memory migration callback defined by the caller to determine how to free destination pages. If a caller, such as memory compaction, builds its own freelist for migration targets, this can reuse already freed memory instead of scanning additional memory. If the caller provides a function to handle freeing of destination pages, it is called when page migration fails. If the caller passes NULL then freeing back to the system will be handled as usual. This patch introduces no functional change. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg: get rid of memcg_create_cache_nameVladimir Davydov
Instead of calling back to memcontrol.c from kmem_cache_create_memcg in order to just create the name of a per memcg cache, let's allocate it in place. We only need to pass the memcg name to kmem_cache_create_memcg for that - everything else can be done in slab_common.c. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: update comment for DEFAULT_MAX_MAP_COUNTKirill A. Shutemov
With ELF extended numbering 16-bit bound is not hard limit any more. [akpm@linux-foundation.org: fix typo] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm/rmap.c: make page_referenced_one() and try_to_unmap_one() staticKirill A. Shutemov
KSM was converted to use rmap_walk() and now nobody uses these functions outside mm/rmap.c. Let's covert them back to static. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: shrinker: add nid to tracepoint outputDave Hansen
Now that we are doing NUMA-aware shrinking, and can have shrinkers running in parallel, or working on individual nodes, it seems like we should also be sticking the node in the output. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Chinner <david@fromorbit.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: shrinker trace points: fix negativesDave Hansen
I was looking at a trace of the slab shrinkers (attachment in this comment): https://bugs.freedesktop.org/show_bug.cgi?id=72742#c67 and noticed that "total_scan" can go negative in some cases. We used to dump out the "total_scan" variable directly, but some of the shrinker modifications along the way changed that. This patch just dumps it out directly, again. It doesn't make any sense to derive it from new_nr and nr any more since there are now other shrinkers that can be running in parallel and mucking with those values. Here's an example of the negative numbers in the output: > kswapd0-840 [000] 160.869398: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 10 new scan count 39 total_scan 29 last shrinker return val 256 > kswapd0-840 [000] 160.869618: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 39 new scan count 102 total_scan 63 last shrinker return val 256 > kswapd0-840 [000] 160.870031: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 102 new scan count 47 total_scan -55 last shrinker return val 768 > kswapd0-840 [000] 160.870464: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 47 new scan count 45 total_scan -2 last shrinker return val 768 > kswapd0-840 [000] 163.384144: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 45 new scan count 56 total_scan 11 last shrinker return val 0 > kswapd0-840 [000] 163.384297: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 56 new scan count 15 total_scan -41 last shrinker return val 256 > kswapd0-840 [000] 163.384414: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 15 new scan count 117 total_scan 102 last shrinker return val 0 > kswapd0-840 [000] 163.384657: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 117 new scan count 36 total_scan -81 last shrinker return val 512 > kswapd0-840 [000] 163.384880: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 36 new scan count 111 total_scan 75 last shrinker return val 256 > kswapd0-840 [000] 163.385256: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 111 new scan count 34 total_scan -77 last shrinker return val 768 > kswapd0-840 [000] 163.385598: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 34 new scan count 122 total_scan 88 last shrinker return val 512 Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Chinner <david@fromorbit.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04include/linux/bootmem.h: cleanup the comment for BOOTMEM_ flagsWang Sheng-Hui
Use BOOTMEM_DEFAULT instead of 0 in the comment. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: introdule compound_head_by_tail()Jianyu Zhan
Currently, in put_compound_page(), we have ====== if (likely(!PageTail(page))) { <------ (1) if (put_page_testzero(page)) { /* ¦* By the time all refcounts have been released ¦* split_huge_page cannot run anymore from under us. ¦*/ if (PageHead(page)) __put_compound_page(page); else __put_single_page(page); } return; } /* __split_huge_page_refcount can run under us */ page_head = compound_head(page); <------------ (2) ====== if at (1) , we fail the check, this means page is *likely* a tail page. Then at (2), as compoud_head(page) is inlined, it is : ====== static inline struct page *compound_head(struct page *page) { if (unlikely(PageTail(page))) { <----------- (3) struct page *head = page->first_page; smp_rmb(); if (likely(PageTail(page))) return head; } return page; } ====== here, the (3) unlikely in the case is a negative hint, because it is *likely* a tail page. So the check (3) in this case is not good, so I introduce a helper for this case. So this patch introduces compound_head_by_tail() which deals with a possible tail page(though it could be spilt by a racy thread), and make compound_head() a wrapper on it. This patch has no functional change, and it reduces the object size slightly: text data bss dec hex filename 11003 1328 16 12347 303b mm/swap.o.orig 10971 1328 16 12315 301b mm/swap.o.patched I've ran "perf top -e branch-miss" to observe branch-miss in this case. As Michael points out, it's a slow path, so only very few times this case happens. But I grep'ed the code base, and found there still are some other call sites could be benifited from this helper. And given that it only bloating up the source by only 5 lines, but with a reduced object size. I still believe this helper deserves to exsit. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: constify nmask argument to set_mempolicy()Rasmus Villemoes
The nmask argument to set_mempolicy() is const according to the user-space header numaif.h, and since the kernel does indeed not modify it, it might as well be declared const in the kernel. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: constify nmask argument to mbind()Rasmus Villemoes
The nmask argument to mbind() is const according to the userspace header numaif.h, and since the kernel does indeed not modify it, it might as well be declared const in the kernel. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/block_dev.c: add bdev_read_page() and bdev_write_page()Matthew Wilcox
A block device driver may choose to provide a rw_page operation. These will be called when the filesystem is attempting to do page sized I/O to page cache pages (ie not for direct I/O). This does preclude I/Os that are larger than page size, so this may only be a performance gain for some devices. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Tested-by: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/mpage.c: factor page_endio() out of mpage_end_io()Matthew Wilcox
page_endio() takes care of updating all the appropriate page flags once I/O has finished to a page. Switch to using mapping_set_error() instead of setting AS_EIO directly; this will handle thin-provisioned devices correctly. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/buffer.c: remove block_write_full_page_endio()Matthew Wilcox
The last in-tree caller of block_write_full_page_endio() was removed in January 2013. It's time to remove the EXPORT_SYMBOL, which leaves block_write_full_page() as the only caller of block_write_full_page_endio(), so inline block_write_full_page_endio() into block_write_full_page(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg, slab: simplify synchronization schemeVladimir Davydov
At present, we have the following mutexes protecting data related to per memcg kmem caches: - slab_mutex. This one is held during the whole kmem cache creation and destruction paths. We also take it when updating per root cache memcg_caches arrays (see memcg_update_all_caches). As a result, taking it guarantees there will be no changes to any kmem cache (including per memcg). Why do we need something else then? The point is it is private to slab implementation and has some internal dependencies with other mutexes (get_online_cpus). So we just don't want to rely upon it and prefer to introduce additional mutexes instead. - activate_kmem_mutex. Initially it was added to synchronize initializing kmem limit (memcg_activate_kmem). However, since we can grow per root cache memcg_caches arrays only on kmem limit initialization (see memcg_update_all_caches), we also employ it to protect against memcg_caches arrays relocation (e.g. see __kmem_cache_destroy_memcg_children). - We have a convention not to take slab_mutex in memcontrol.c, but we want to walk over per memcg memcg_slab_caches lists there (e.g. for destroying all memcg caches on offline). So we have per memcg slab_caches_mutex's protecting those lists. The mutexes are taken in the following order: activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex Such a syncrhonization scheme has a number of flaws, for instance: - We can't call kmem_cache_{destroy,shrink} while walking over a memcg::memcg_slab_caches list due to locking order. As a result, in mem_cgroup_destroy_all_caches we schedule the memcg_cache_params::destroy work shrinking and destroying the cache. - We don't have a mutex to synchronize per memcg caches destruction between memcg offline (mem_cgroup_destroy_all_caches) and root cache destruction (__kmem_cache_destroy_memcg_children). Currently we just don't bother about it. This patch simplifies it by substituting per memcg slab_caches_mutex's with the global memcg_slab_mutex. It will be held whenever a new per memcg cache is created or destroyed, so it protects per root cache memcg_caches arrays and per memcg memcg_slab_caches lists. The locking order is following: activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex This allows us to call kmem_cache_{create,shrink,destroy} under the memcg_slab_mutex. As a result, we don't need memcg_cache_params::destroy work any more - we can simply destroy caches while iterating over a per memcg slab caches list. Also using the global mutex simplifies synchronization between concurrent per memcg caches creation/destruction, e.g. mem_cgroup_destroy_all_caches vs __kmem_cache_destroy_memcg_children. The downside of this is that we substitute per-memcg slab_caches_mutex's with a hummer-like global mutex, but since we already take either the slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it shouldn't hurt concurrency a lot. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slabVladimir Davydov
Currently we have two pairs of kmemcg-related functions that are called on slab alloc/free. The first is memcg_{bind,release}_pages that count the total number of pages allocated on a kmem cache. The second is memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource counter. Let's just merge them to keep the code clean. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg, slab: do not schedule cache destruction when last page goes awayVladimir Davydov
This patchset is a part of preparations for kmemcg re-parenting. It targets at simplifying kmemcg work-flows and synchronization. First, it removes async per memcg cache destruction (see patches 1, 2). Now caches are only destroyed on memcg offline. That means the caches that are not empty on memcg offline will be leaked. However, they are already leaked, because memcg_cache_params::nr_pages normally never drops to 0 so the destruction work is never scheduled except kmem_cache_shrink is called explicitly. In the future I'm planning reaping such dead caches on vmpressure or periodically. Second, it substitutes per memcg slab_caches_mutex's with the global memcg_slab_mutex, which should be taken during the whole per memcg cache creation/destruction path before the slab_mutex (see patch 3). This greatly simplifies synchronization among various per memcg cache creation/destruction paths. I'm still not quite sure about the end picture, in particular I don't know whether we should reap dead memcgs' kmem caches periodically or try to merge them with their parents (see https://lkml.org/lkml/2014/4/20/38 for more details), but whichever way we choose, this set looks like a reasonable change to me, because it greatly simplifies kmemcg work-flows and eases further development. This patch (of 3): After a memcg is offlined, we mark its kmem caches that cannot be deleted right now due to pending objects as dead by setting the memcg_cache_params::dead flag, so that memcg_release_pages will schedule cache destruction (memcg_cache_params::destroy) as soon as the last slab of the cache is freed (memcg_cache_params::nr_pages drops to zero). I guess the idea was to destroy the caches as soon as possible, i.e. immediately after freeing the last object. However, it just doesn't work that way, because kmem caches always preserve some pages for the sake of performance, so that nr_pages never gets to zero unless the cache is shrunk explicitly using kmem_cache_shrink. Of course, we could account the total number of objects on the cache or check if all the slabs allocated for the cache are empty on kmem_cache_free and schedule destruction if so, but that would be too costly. Thus we have a piece of code that works only when we explicitly call kmem_cache_shrink, but complicates the whole picture a lot. Moreover, it's racy in fact. For instance, kmem_cache_shrink may free the last slab and thus schedule cache destruction before it finishes checking that the cache is empty, which can lead to use-after-free. So I propose to remove this async cache destruction from memcg_release_pages, and check if the cache is empty explicitly after calling kmem_cache_shrink instead. This will simplify things a lot w/o introducing any functional changes. And regarding dead memcg caches (i.e. those that are left hanging around after memcg offline for they have objects), I suppose we should reap them either periodically or on vmpressure as Glauber suggested initially. I'm going to implement this later. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg: kill CONFIG_MM_OWNEROleg Nesterov
CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only selected by CONFIG_MEMCG automatically. So we can kill this option in init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm/swap.c: clean up *lru_cache_add* functionsJianyu Zhan
In mm/swap.c, __lru_cache_add() is exported, but actually there are no users outside this file. This patch unexports __lru_cache_add(), and makes it static. It also exports lru_cache_add_file(), as it is use by cifs and fuse, which can loaded as modules. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shaohua Li <shli@kernel.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mem-hotplug: implement get/put_online_memsVladimir Davydov
kmem_cache_{create,destroy,shrink} need to get a stable value of cpu/node online mask, because they init/destroy/access per-cpu/node kmem_cache parts, which can be allocated or destroyed on cpu/mem hotplug. To protect against cpu hotplug, these functions use {get,put}_online_cpus. However, they do nothing to synchronize with memory hotplug - taking the slab_mutex does not eliminate the possibility of race as described in patch 2. What we need there is something like get_online_cpus, but for memory. We already have lock_memory_hotplug, which serves for the purpose, but it's a bit of a hammer right now, because it's backed by a mutex. As a result, it imposes some limitations to locking order, which are not desirable, and can't be used just like get_online_cpus. That's why in patch 1 I substitute it with get/put_online_mems, which work exactly like get/put_online_cpus except they block not cpu, but memory hotplug. [ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by myself, because it used an rw semaphore for get/put_online_mems, making them dead lock prune. ] This patch (of 2): {un}lock_memory_hotplug, which is used to synchronize against memory hotplug, is currently backed by a mutex, which makes it a bit of a hammer - threads that only want to get a stable value of online nodes mask won't be able to proceed concurrently. Also, it imposes some strong locking ordering rules on it, which narrows down the set of its usage scenarios. This patch introduces get/put_online_mems, which are the same as get/put_online_cpus, but for memory hotplug, i.e. executing a code inside a get/put_online_mems section will guarantee a stable value of online nodes, present pages, etc. lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: page_alloc: do not cache reclaim distancesMel Gorman
pgdat->reclaim_nodes tracks if a remote node is allowed to be reclaimed by zone_reclaim due to its distance. As it is expected that zone_reclaim_mode will be rarely enabled it is unreasonable for all machines to take a penalty. Fortunately, the zone_reclaim_mode() path is already slow and it is the path that takes the hit. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: disable zone_reclaim_mode by defaultMel Gorman
When it was introduced, zone_reclaim_mode made sense as NUMA distances punished and workloads were generally partitioned to fit into a NUMA node. NUMA machines are now common but few of the workloads are NUMA-aware and it's routine to see major performance degradation due to zone_reclaim_mode being enabled but relatively few can identify the problem. Those that require zone_reclaim_mode are likely to be able to detect when it needs to be enabled and tune appropriately so lets have a sensible default for the bulk of users. This patch (of 2): zone_reclaim_mode causes processes to prefer reclaiming memory from local node instead of spilling over to other nodes. This made sense initially when NUMA machines were almost exclusively HPC and the workload was partitioned into nodes. The NUMA penalties were sufficiently high to justify reclaiming the memory. On current machines and workloads it is often the case that zone_reclaim_mode destroys performance but not all users know how to detect this. Favour the common case and disable it by default. Users that are sophisticated enough to know they need zone_reclaim_mode will detect it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04hugetlb: add hstate_is_gigantic()Luiz Capitulino
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: pass VM_BUG_ON() reason to dump_page()Dave Hansen
I recently added a patch to let folks pass a "reason" string dump_page() which gets dumped out along with the page's data. This essentially saves the bug-reader a trip in to the source to figure out why we BUG_ON()'d. The new VM_BUG_ON_PAGE() passes in NULL for "reason". It seems like we might as well pass the BUG_ON() condition if we have it. This will bloat kernels a bit with ~160 new strings, but this is all under a debugging option anyway. page:ffffea0008560280 count:1 mapcount:0 mapping:(null) index:0x0 page flags: 0xbfffc0000000001(locked) page dumped because: VM_BUG_ON_PAGE(PageLocked(page)) ------------[ cut here ]------------ kernel BUG at /home/davehans/linux.git/mm/filemap.c:464! invalid opcode: 0000 [#1] SMP CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #251 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ... [akpm@linux-foundation.org: include stringify.h] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04include/linux/mmdebug.h: add VM_WARN_ON() and VM_WARN_ON_ONCE()Andrew Morton
WARN_ON() and WARN_ON_ONCE(), dependent on CONFIG_DEBUG_VM Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04cma: add placement specifier for "cma=" kernel parameterAkinobu Mita
Currently, "cma=" kernel parameter is used to specify the size of CMA, but we can't specify where it is located. We want to locate CMA below 4GB for devices only supporting 32-bit addressing on 64-bit systems without iommu. This enables to specify the placement of CMA by extending "cma=" kernel parameter. Examples: 1. locate 64MB CMA below 4GB by "cma=64M@0-4G" 2. locate 64MB CMA exact at 512MB by "cma=64M@512M" Note that the DMA contiguous memory allocator on x86 assumes that page_address() works for the pages to allocate. So this change requires to limit end address of contiguous memory area upto max_pfn_mapped to prevent from locating it on highmem area by the argument of dma_contiguous_reserve(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memblock: introduce memblock_alloc_range()Akinobu Mita
This introduces memblock_alloc_range() which allocates memblock from the specified range of physical address. I would like to use this function to specify the location of CMA. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04x86: enable DMA CMA with swiotlbAkinobu Mita
The DMA Contiguous Memory Allocator support on x86 is disabled when swiotlb config option is enabled. So DMA CMA is always disabled on x86_64 because swiotlb is always enabled. This attempts to support for DMA CMA with enabling swiotlb config option. The contiguous memory allocator on x86 is integrated in the function dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops for dma_alloc_coherent(). x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops tries to allocate with dma_generic_alloc_coherent() firstly and then swiotlb_alloc_coherent() is called as a fallback. The main part of supporting DMA CMA with swiotlb is that changing x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops for dma_free_coherent() so that it can distinguish memory allocated by dma_generic_alloc_coherent() from one allocated by swiotlb_alloc_coherent() and release it with dma_generic_free_coherent() which can handle contiguous memory. This change requires making is_swiotlb_buffer() global function. This also needs to change .free callback in the dma_map_ops for amd_gart and sta2x11, because these dma_ops are also using dma_generic_alloc_coherent(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm,vmacache: add debug dataDavidlohr Bueso
Introduce a CONFIG_DEBUG_VM_VMACACHE option to enable counting the cache hit rate -- exported in /proc/vmstat. Any updates to the caching scheme needs this kind of data, thus it can save some work re-implementing the counting all the time. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: get rid of __GFP_KMEMCGVladimir Davydov
Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. [sfr@canb.auug.org.au: export kmalloc_order() to modules] Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04sl[au]b: charge slabs to kmemcg explicitlyVladimir Davydov
We have only a few places where we actually want to charge kmem so instead of intruding into the general page allocation path with __GFP_KMEMCG it's better to explictly charge kmem there. All kmem charges will be easier to follow that way. This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG from memcg caches' allocflags. Instead it makes slab allocation path call memcg_charge_kmem directly getting memcg to charge from the cache's memcg params. This also eliminates any possibility of misaccounting an allocation going from one memcg's cache to another memcg, because now we always charge slabs against the memcg the cache belongs to. That's why this patch removes the big comment to memcg_kmem_get_cache. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levelsMel Gorman
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults on x86. Care is taken such that _PAGE_NUMA is used only in situations where the VMA flags distinguish between NUMA hinting faults and prot_none faults. This decision was x86-specific and conceptually it is difficult requiring special casing to distinguish between PROTNONE and NUMA ptes based on context. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults as if the PTE is not present then a fault will be trapped. Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset. This patch shrinks the maximum possible swap size and uses the bit to uniquely distinguish between NUMA hinting ptes and swap ptes. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/libfs.c: add generic data flush to fsyncFabian Frederick
Description by Jan Kara: "A lot of older filesystems don't properly flush volatile disk caches on fsync(2) which can lead to loss of fsynced data after power failure. This patch makes generic_file_fsync() issue proper cache flush to fix the problem. Sysadmin can use /sys/devices/.../cache_type to tell the system it should not send the cache flush." [akpm@linux-foundation.org: nuke ifdef] [akpm@linux-foundation.org: fix warning] Signed-off-by: Fabian Frederick <fabf@skynet.be> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi
Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core irq updates from Thomas Gleixner: "The irq department delivers: - Another tree wide update to get rid of the horrible create_irq interface along with its even more horrible variants. That also gets rid of the last leftovers of the initial sparse irq hackery. arch/driver specific changes have been either acked or ignored. - A fix for the spurious interrupt detection logic with threaded interrupts. - A new ARM SoC interrupt controller - The usual pile of fixes and improvements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller genirq: Improve documentation to match current implementation ARM: iop13xx: fix msi support with sparse IRQ genirq: Provide !SMP stub for irq_set_affinity_notifier() irqchip: armada-370-xp: Move the devicetree binding documentation irqchip: gic: Use mask field in GICC_IAR genirq: Remove dynamic_irq mess ia64: Use irq_init_desc genirq: Replace dynamic_irq_init/cleanup genirq: Remove irq_reserve_irq[s] genirq: Replace reserve_irqs in core code s390: Avoid call to irq_reserve_irqs() s390: Remove pointless arch_show_interrupts() s390: pci: Check return value of alloc_irq_desc() proper sh: intc: Remove pointless irq_reserve_irqs() invocation x86, irq: Remove pointless irq_reserve_irqs() call genirq: Make create/destroy_irq() ia64 private tile: Use SPARSE_IRQ tile: pci: Use irq_alloc/free_hwirq() ...
2014-06-04Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull timer core updates from Thomas Gleixner: "This time you get nothing really exciting: - A huge update to the sh* clocksource drivers - Support for two more ARM SoCs - Removal of the deprecated setup_sched_clock() API - The usual pile of fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) clocksource: Add Freescale FlexTimer Module (FTM) timer support ARM: dts: vf610: Add Freescale FlexTimer Module timer node. clocksource: ftm: Add FlexTimer Module (FTM) Timer devicetree Documentation clocksource: sh_tmu: Remove unnecessary OOM messages clocksource: sh_mtu2: Remove unnecessary OOM messages clocksource: sh_cmt: Remove unnecessary OOM messages clocksource: em_sti: Remove unnecessary OOM messages clocksource: dw_apb_timer_of: Do not trace read_sched_clock clocksource: Fix clocksource_mmio_readX_down clocksource: Fix type confusion for clocksource_mmio_readX_Y clocksource: sh_tmu: Fix channel IRQ retrieval in legacy case clocksource: qcom: Implement read_current_timer for udelay ntp: Make is_error_status() use its argument ntp: Convert simple_strtol to kstrtol timer_stats/doc: Fix /proc/timer_stats documentation sched_clock: Remove deprecated setup_sched_clock() API ARM: sun6i: a31: Add support for the High Speed Timers clocksource: sun5i: Add support for reset controller clocksource: efm32: use $vendor,$device scheme for compatible string KConfig: Vexpress: build the ARM_GLOBAL_TIMER with vexpress platform ...
2014-06-04xen-net{back, front}: Document multi-queue feature in netif.hAndrew J. Bennieston
Document the multi-queue feature in terms of XenStore keys to be written by the backend and by the frontend. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04Merge branch 'for-linus' of git://git.kernel.dk/linux-block into nextLinus Torvalds
Pull block follow-up bits from Jens Axboe: "A few minor (but important) fixes for blk-mq for the -rc1 window. - Hot removal potential oops fix for single queue devices. From me. - Two merged patches in late May meant that we accidentally lost a fix for freeing an active queue. Fix that up. From me. - A change of the blk_mq_tag_to_rq() API, passing in blk_mq_tags, to make life considerably easier for scsi-mq. From me. - A schedule-while-atomic fix from Ming Lei, which would hit if the tag space was exhausted. - Missing __percpu annotation in one place in blk-mq. Found by the magic Wu compile bot due to code being moved around by the previous patch, but it's actually an older issue. From Ming Lei. - Clearing of tag of a flush request at end_io time. From Ming Lei" * 'for-linus' of git://git.kernel.dk/linux-block: block: mq flush: clear flush_rq's tag in flush_end_io() blk-mq: let blk_mq_tag_to_rq() take blk_mq_tags as the main parameter blk-mq: fix regression from commit 624dbe475416 blk-mq: handle NULL req return from blk_map_request in single queue mode blk-mq: fix sparse warning on missed __percpu annotation blk-mq: fix schedule from atomic context blk-mq: move blk_mq_get_ctx/blk_mq_put_ctx to mq private header
2014-06-04Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media into next Pull media updates from Mauro Carvalho Chehab: "This contains: - a new frontend/tuner driver set for si2168 and sa2157 - Videobuf 2 core now supports DVB too - A new gspca sub-driver (dtcs033) - saa7134 is now converted to use videobuf2 - add support for 4K timings - several other driver fixes and improvements PS. This pull request is shorter than usual, partly because I have some other patches on topic branches that I'll be sending you later this week" * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (286 commits) [media] au0828-dvb: restore its permission to 644 [media] xc5000: delay tuner sleep to 5 seconds [media] xc5000: Don't use whitespace before tabs [media] xc5000: fix CamelCase [media] xc5000: Don't wrap msleep() [media] xc5000: get rid of positive error codes [media] au0828: reset streaming when a new frequency is set [media] au0828: Improve debug messages for urb_completion [media] au0828: Cancel stream-restart operation if frontend is disconnected [media] dib0700: fix RC support on Hauppauge Nova-TD [media] USB: as102_usb_drv.c: Remove useless return variables [media] v4l: Fix documentation of V4L2_PIX_FMT_H264_MVC and VP8 pixel formats [media] m5mols: Replace missing header [media] staging: lirc: Fix sparse warnings [media] fix mceusb endpoint type identification/handling [media] az6027: Added the PID for a new revision of the Elgato EyeTV Sat DVB-S Tuner [media] DocBook media: fix typo [media] adv7604: Add missing include to linux/types.h [media] v4l: Validate fields in the core code for subdev EDID ioctls [media] v4l: Add support for DV timings ioctls on subdev nodes ...
2014-06-04Merge branch '3.15-fixes' into mips-for-linux-nextRalf Baechle
2014-06-04tracing: Add __get_dynamic_array_len() macro for trace eventsSteven Rostedt (Red Hat)
If a trace event uses a dynamic array for something other than a string then there's currently no way the TP_printk() can figure out what size it is. A __get_dynamic_array_len() is required to know the length. This also simplifies the __get_bitmask() macro which required it as well, but instead just hardcoded it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-04Merge tag 'devicetree-for-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux into next Pull DeviceTree updates from Rob Herring: - Another round of clean-up of FDT related code in architecture code. This removes knowledge of internal FDT details from most architectures except powerpc. - Conversion of kernel's custom FDT parsing code to use libfdt. - DT based initialization for generic serial earlycon. The introduction of generic serial earlycon support went in through the tty tree. - Improve the platform device naming for DT probed devices to ensure unique naming and use parent names instead of a global index. - Fix a race condition in of_update_property. - Unify the various linker section OF match tables and fix several function prototype errors. - Update platform_get_irq_byname to work in deferred probe cases. - 2 binding doc updates * tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (58 commits) of: handle NULL node in next_child iterators of/irq: provide more wrappers for !CONFIG_OF devicetree: bindings: Document micrel vendor prefix dt: bindings: dwc2: fix required value for the phy-names property of_pci_irq: kill useless variable in of_irq_parse_pci() of/irq: do irq resolution in platform_get_irq_byname() of: Add a testcase for of_find_node_by_path() of: Make of_find_node_by_path() handle /aliases of: Create unlocked version of for_each_child_of_node() lib: add glibc style strchrnul() variant of: Handle memory@0 node on PPC32 only pci/of: Remove dead code of: fix race between search and remove in of_update_property() of: Use NULL for pointers of: Stop naming platform_device using dcr address of: Ensure unique names without sacrificing determinism tty/serial: pl011: add DT based earlycon support of/fdt: add FDT serial scanning for earlycon of/fdt: add FDT address translation support serial: earlycon: add DT support ...
2014-06-04IB/core: Fix sparse warnings about redeclared functionsRoland Dreier
Fix a few functions that are declared with __attribute_const__ in the ib_verbs.h header file but defined without it in verbs.c. This gets rid of the following sparse warnings: drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04Merge branch 'for-3.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu fix from Tejun Heo: "It is very late but this is an important percpu-refcount fix from Sebastian Ott. The problem is that percpu_ref_*() used __this_cpu_*() instead of this_cpu_*(). The difference between the two is that the latter is atomic on the local cpu while the former is not. this_cpu_inc() is guaranteed to increment the percpu counter on the cpu that the operation is executed on without any synchronization; however, __this_cpu_inc() doesn't and if the local cpu invokes the function from different contexts (e.g. process and irq) of the same CPU, it's not guaranteed to actually increment as it may be implemented as rmw. This bug existed from the get-go but it hasn't been noticed earlier probably because on x86 __this_cpu_inc() is equivalent to this_cpu_inc() as both get translated into single instruction; however, s390 uses the generic rmw implementation and gets affected by the bug. Kudos to Sebastian and Heiko for diagnosing it. The change is very low risk and fixes a critical issue on the affected architectures, so I think it's a good candidate for inclusion although it's very late in the devel cycle. On the other hand, this has been broken since v3.11, so backporting it through -stable post -rc1 won't be the end of the world. I'll ping Christoph whether __this_cpu_*() ops can be better annotated so that it can trigger lockdep warning when used from multiple contexts" * 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu-refcount: fix usage of this_cpu_ops
2014-06-04Merge branch 'for-3.15-fixes' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git into for-3.16 Pull percpu/for-3.15-fixes into percpu/for-3.16 to receive 0c36b390a546 ("percpu-refcount: fix usage of this_cpu_ops"). The merge doesn't produce any conflict but the automatic merge is still incorrect because 4fb6e25049cb ("percpu-refcount: implement percpu_ref_tryget()") added another use of __this_cpu_inc() which should also be converted to this_cpu_ince(). This commit pulls in percpu/for-3.15-fixes and converts the newly added __this_cpu_inc() to this_cpu_inc(). Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-04blk-mq: let blk_mq_tag_to_rq() take blk_mq_tags as the main parameterJens Axboe
We currently pass in the hardware queue, and get the tags from there. But from scsi-mq, with a shared tag space, it's a lot more convenient to pass in the blk_mq_tags instead as the hardware queue isn't always directly available. So instead of having to re-map to a given hardware queue from rq->mq_ctx, just pass in the tags structure. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-06-04percpu-refcount: fix usage of this_cpu_opsSebastian Ott
The percpu-refcount infrastructure uses the underscore variants of this_cpu_ops in order to modify percpu reference counters. (e.g. __this_cpu_inc()). However the underscore variants do not atomically update the percpu variable, instead they may be implemented using read-modify-write semantics (more than one instruction). Therefore it is only safe to use the underscore variant if the context is always the same (process, softirq, or hardirq). Otherwise it is possible to lose updates. This problem is something that Sebastian has seen within the aio subsystem which uses percpu refcounters both in process and softirq context leading to reference counts that never dropped to zeroes; even though the number of "get" and "put" calls matched. Fix this by using the non-underscore this_cpu_ops variant which provides correct per cpu atomic semantics and fixes the corrupted reference counts. Cc: Kent Overstreet <kmo@daterainc.com> Cc: <stable@vger.kernel.org> # v3.11+ Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org> References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
2014-06-04Merge tag 'sound-3.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into next Pull sound updates from Takashi Iwai: "At this time, majority of changes come from ASoC world while we got a few new drivers in other places for FireWire and USB. There have been lots of ASoC core cleanups / refactoring, but very little visible to external users. ASoC: - Support for specifying aux CODECs in DT - Removal of the deprecated mux and enum macros - More moves towards full componentisation - Removal of some unused I/O code - Lots of cleanups, fixes and enhancements to the davinci, Freescale, Haswell and Realtek drivers - Several drivers exposed directly in Kconfig for use with simple-card - GPIO descriptor support for jacks - More updates and fixes to the Freescale SSI, Intel and rsnd drivers - New drivers for Cirrus CS42L56, Realtek RT5639, RT5642 and RT5651 and ST STA350, Analog Devices ADAU1361, ADAU1381, ADAU1761 and ADAU1781, and Realtek RT5677 HD-audio: - Clean up Dell headset quirks - Noise fixes for Dell and Sony laptops - Thinkpad T440 dock fix - Realtek codec updates (ALC293,ALC233,ALC3235) - Tegra HD-audio HDMI support FireWire-audio: - FireWire audio stack enhancement (AMDTP, MIDI), support for incoming isochronous stream and duplex streams with timestamp synchronization - BeBoB-based devices support - Fireworks-based device support USB-audio: - Behringer BCD2000 USB device support Misc: - Clean up of a few old drivers, atmel, fm801, etc" * tag 'sound-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (480 commits) ASoC: Fix wrong argument for card remove callbacks ASoC: free jack GPIOs before the sound card is freed ALSA: firewire-lib: Remove a comment about restriction of asynchronous operation ASoC: cache: Fix error code when not using ASoC level cache ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop ALSA: firewire-lib: Use IEC 61883-6 compliant labels for Raw Audio data ASoC: add RT5677 CODEC driver ASoC: intel: The Baytrail/MAX98090 driver depends on I2C ASoC: rt5640: Add the function "get_clk_info" to RL6231 shared support ASoC: rt5640: Add the function of the PLL clock calculation to RL6231 shared support ASoC: rt5640: Add RL6231 class device shared support for RT5640, RT5645 and RT5651 ASoC: cache: Fix possible ZERO_SIZE_PTR pointer dereferencing error. ASoC: Add helper functions to cast from DAPM context to CODEC/platform ALSA: bebob: sizeof() vs ARRAY_SIZE() typo ASoC: wm9713: correct mono out PGA sources ALSA: synth: emux: soundfont.c: Cleaning up memory leak ASoC: fsl: Remove dependencies of boards for SND_SOC_EUKREA_TLV320 ASoC: fsl-ssi: Use regmap ASoC: fsl-ssi: reorder and document fsl_ssi_private ...
2014-06-04Merge tag 'fbdev-omap-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into next Pull omap fbdev changes from Tomi Valkeinen: - DT support for the panel drivers that were still missing it - TI AM43xx support - TI OMAP5 support * tag 'fbdev-omap-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (46 commits) OMAPDSS: move 'compatible' converter to omapdss driver OMAPDSS: HDMI: fix devm_ioremap_resource error checks OMAPDSS: HDMI: remove unused defines OMAPDSS: HDMI: cleanup WP ioremaps OMAPDSS: panel NEC-NL8048HL11 DT support Doc/DT: Add DT binding documentation for TPO td043mtea1 panel OMAPDSS: Panel TPO-TD043MTEA1 DT support Doc/DT: Add DT binding documentation for SHARP LS037V7DW01 OMAPDSS: panel sharp-ls037v7dw01 DT support OMAPDSS: panel-sharp-ls037v7dw01: update to use gpiod Doc/DT: Add binding doc for lgphilips,lb035q02.txt OMAPDSS: panel-lgphilips-lb035q02: Add DT support OMAPDSS: panel-lgphilips-lb035q02: use gpiod for enable gpio OMAPDSS: hdmi5_core: Fix compilation with OMAP5_DSS_HDMI_AUDIO OMAPDSS: panel-dpi: enable-gpio OMAPDSS: Fix writes to DISPC_POL_FREQ Doc/DT: Add OMAP5 DSS DT bindings OMAPDSS: HDMI: cleanup ioremaps OMAPDSS: HDMI: Add OMAP5 HDMI support OMAPDSS: HDMI: PLL changes for OMAP5 ...
2014-06-04Merge tag 'fbdev-main-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into next Pull main fbdev changes from Tomi Valkeinen: "Mainly fixes and small improvements. The biggest change seems to be backlight control support for mx3fb" * tag 'fbdev-main-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (31 commits) drivers/video/fbdev/fb-puv3.c: Add header files for function unifb_mmap video: fbdev: s3fb.c: Fix for possible null pointer dereference video: fbdev: grvga.c: Fix for possible null pointer dereference matroxfb: perform a dummy read of M_STATUS video: of: display_timing: fix default native-mode setting video: delete unneeded call to platform_get_drvdata video: mx3fb: Add backlight control support video: omap: delete support for early fbmem allocation video: of: display_timing: remove two unsafe error messages fbdev: fbmem: remove positive test on unsigned values fbcon: Fix memory leak in con2fb_release_oldinfo() video: Kconfig: Add a dependency to the Goldfish framebuffer driver video: exynos: Add a dependency to the menu video: mx3fb: Use devm_kzalloc video/nuc900: allow modular build video: atmel needs FB_BACKLIGHT video: export fb_prepare_logo video/mbx: fix building debugfs support video/omap: fix modular build video: clarify I2C dependencies ...