summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-04-05headers: untangle kmemleak.h from mm.hRandy Dunlap
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious reason. It looks like it's only a convenience, so remove kmemleak.h from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't already #include it. Also remove <linux/kmemleak.h> from source files that do not use it. This is tested on i386 allmodconfig and x86_64 allmodconfig. It would be good to run it through the 0day bot for other $ARCHes. I have neither the horsepower nor the storage space for the other $ARCHes. Update: This patch has been extensively build-tested by both the 0day bot & kisskb/ozlabs build farms. Both of them reported 2 build failures for which patches are included here (in v2). [ slab.h is the second most used header file after module.h; kernel.h is right there with slab.h. There could be some minor error in the counting due to some #includes having comments after them and I didn't combine all of those. ] [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr] Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures] Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05include/linux/mmdebug.h: make VM_WARN* non-rvalsMichal Hocko
At present the construct if (VM_WARN(...)) will compile OK with CONFIG_DEBUG_VM=y and will fail with CONFIG_DEBUG_VM=n. The reason is that VM_{WARN,BUG}* have always been special wrt. {WARN/BUG}* and never generate any code when DEBUG_VM is disabled. So we cannot really use it in conditionals. We considered changing things so that this construct works in both cases but that might cause unwanted code generation with CONFIG_DEBUG_VM=n. It is safer and simpler to make the build fail in both cases. [akpm@linux-foundation.org: changelog] Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: change return type to vm_fault_tSouptick Joarder
The plan for these patches is to introduce the typedef, initially just as documentation ("These functions should return a VM_FAULT_ status"). We'll trickle the patches to individual drivers/filesystems in through the maintainers, as far as possible. Then we'll change the typedef to an unsigned int and break the compilation of any unconverted drivers/filesystems. vmf_insert_page(), vmf_insert_mixed() and vmf_insert_pfn() are three newly added functions. The various drivers/filesystems where return value of fault(), huge_fault(), page_mkwrite() and pfn_mkwrite() get converted, will need them. These functions will return correct VM_FAULT_ code based on err value. We've had bugs before where drivers returned -EFOO. And we have this silly inefficiency where vm_insert_xxx() return an errno which (afaict) every driver then converts into a VM_FAULT code. In many cases drivers failed to return correct VM_FAULT code value despite of vm_insert_xxx() fails. We have indentified and clean up all those existing bugs and silly inefficiencies in driver/filesystems by adding these three new inline wrappers. As mentioned above, we will trickle those patches to individual drivers/filesystems in through maintainers after these three wrapper functions are merged. Eventually we can convert vm_insert_xxx() into vmf_insert_xxx() and remove these inline wrappers, but these are a good intermediate step. Link: http://lkml.kernel.org/r/20180310162351.GA7422@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memoryDavid Rientjes
Kswapd will not wakeup if per-zone watermarks are not failing or if too many previous attempts at background reclaim have failed. This can be true if there is a lot of free memory available. For high- order allocations, kswapd is responsible for waking up kcompactd for background compaction. If the zone is not below its watermarks or reclaim has recently failed (lots of free memory, nothing left to reclaim), kcompactd does not get woken up. When __GFP_DIRECT_RECLAIM is not allowed, allow kcompactd to still be woken up even if kswapd will not reclaim. This allows high-order allocations, such as thp, to still trigger background compaction even when the zone has an abundance of free memory. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803111659420.209721@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: make counting of list_lru_one::nr_items locklessKirill Tkhai
During the reclaiming slab of a memcg, shrink_slab iterates over all registered shrinkers in the system, and tries to count and consume objects related to the cgroup. In case of memory pressure, this behaves bad: I observe high system time and time spent in list_lru_count_one() for many processes on RHEL7 kernel. This patch makes list_lru_node::memcg_lrus rcu protected, that allows to skip taking spinlock in list_lru_count_one(). Shakeel Butt with the patch observes significant perf graph change. He says: ======================================================================== Setup: running a fork-bomb in a memcg of 200MiB on a 8GiB and 4 vcpu VM and recording the trace with 'perf record -g -a'. The trace without the patch: + 34.19% fb.sh [kernel.kallsyms] [k] queued_spin_lock_slowpath + 30.77% fb.sh [kernel.kallsyms] [k] _raw_spin_lock + 3.53% fb.sh [kernel.kallsyms] [k] list_lru_count_one + 2.26% fb.sh [kernel.kallsyms] [k] super_cache_count + 1.68% fb.sh [kernel.kallsyms] [k] shrink_slab + 0.59% fb.sh [kernel.kallsyms] [k] down_read_trylock + 0.48% fb.sh [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore + 0.38% fb.sh [kernel.kallsyms] [k] shrink_node_memcg + 0.32% fb.sh [kernel.kallsyms] [k] queue_work_on + 0.26% fb.sh [kernel.kallsyms] [k] count_shadow_nodes With the patch: + 0.16% swapper [kernel.kallsyms] [k] default_idle + 0.13% oom_reaper [kernel.kallsyms] [k] mutex_spin_on_owner + 0.05% perf [kernel.kallsyms] [k] copy_user_generic_string + 0.05% init.real [kernel.kallsyms] [k] wait_consider_task + 0.05% kworker/0:0 [kernel.kallsyms] [k] finish_task_switch + 0.04% kworker/2:1 [kernel.kallsyms] [k] finish_task_switch + 0.04% kworker/3:1 [kernel.kallsyms] [k] finish_task_switch + 0.04% kworker/1:0 [kernel.kallsyms] [k] finish_task_switch + 0.03% binary [kernel.kallsyms] [k] copy_page ======================================================================== Thanks Shakeel for the testing. [ktkhai@virtuozzo.com: v2] Link: http://lkml.kernel.org/r/151203869520.3915.2587549826865799173.stgit@localhost.localdomain Link: http://lkml.kernel.org/r/150583358557.26700.8490036563698102569.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05zsmalloc: introduce zs_huge_class_size()Sergey Senozhatsky
Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3. ZRAM's max_zpage_size is a bad thing. It forces zsmalloc to store normal objects as huge ones, which results in bigger zsmalloc memory usage. Drop it and use actual zsmalloc huge-class value when decide if the object is huge or not. This patch (of 2): Not every object can be share its zspage with other objects, e.g. when the object is as big as zspage or nearly as big a zspage. For such objects zsmalloc has a so called huge class - every object which belongs to huge class consumes the entire zspage (which consists of a physical page). On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is 3264, so starting down from size 3264, objects can share page(-s) and thus minimize memory wastage. ZRAM, however, has its own statically defined watermark for huge objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every object larger than this watermark (3072) as a PAGE_SIZE object, in other words, to a huge class, while zsmalloc can keep some of those objects in non-huge classes. This results in increased memory consumption. zsmalloc knows better if the object is huge or not. Introduce zs_huge_class_size() function which tells if the given object can be stored in one of non-huge classes or not. This will let us to drop ZRAM's huge object watermark and fully rely on zsmalloc when we decide if the object is huge. [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()] Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: fix races between swapoff and flush dcacheHuang Ying
Thanks to commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"), after swapoff the address_space associated with the swap device will be freed. So page_mapping() users which may touch the address_space need some kind of mechanism to prevent the address_space from being freed during accessing. The dcache flushing functions (flush_dcache_page(), etc) in architecture specific code may access the address_space of swap device for anonymous pages in swap cache via page_mapping() function. But in some cases there are no mechanisms to prevent the swap device from being swapoff, for example, CPU1 CPU2 __get_user_pages() swapoff() flush_dcache_page() mapping = page_mapping() ... exit_swap_address_space() ... kvfree(spaces) mapping_mapped(mapping) The address space may be accessed after being freed. But from cachetlb.txt and Russell King, flush_dcache_page() only care about file cache pages, for anonymous pages, flush_anon_page() should be used. The implementation of flush_dcache_page() in all architectures follows this too. They will check whether page_mapping() is NULL and whether mapping_mapped() is true to determine whether to flush the dcache immediately. And they will use interval tree (mapping->i_mmap) to find all user space mappings. While mapping_mapped() and mapping->i_mmap isn't used by anonymous pages in swap cache at all. So, to fix the race between swapoff and flush dcache, __page_mapping() is add to return the address_space for file cache pages and NULL otherwise. All page_mapping() invoking in flush dcache functions are replaced with page_mapping_file(). [akpm@linux-foundation.org: simplify page_mapping_file(), per Mike] Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05include/linux/mm.h: provide consistent declaration for num_poisoned_pagesGuenter Roeck
clang reports the following compile warning. In file included from mm/vmscan.c:56: ./include/linux/swapops.h:327:22: warning: section attribute is specified on redeclared variable [-Wsection] extern atomic_long_t num_poisoned_pages __read_mostly; ^ ./include/linux/mm.h:2585:22: note: previous declaration is here extern atomic_long_t num_poisoned_pages; ^ Let's use __read_mostly everywhere. Link: http://lkml.kernel.org/r/1519686565-8224-1-git-send-email-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm, hugetlbfs: introduce ->pagesize() to vm_operations_structDan Williams
When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and report the MMU page mapping size that is being enforced by the vma. Similar to commit 31383c6865a5 "mm, hugetlbfs: introduce ->split() to vm_operations_struct" it would be messy to teach vma_mmu_pagesize() about device-dax page mapping sizes in the same (hstate) way that hugetlbfs communicates this attribute. Instead, these patches introduce a new ->pagesize() vm operation. Link: http://lkml.kernel.org/r/151996254734.27922.15813097401404359642.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: make should_failslab always available for fault injectionHoward McLauchlan
should_failslab() is a convenient function to hook into for directed error injection into kmalloc(). However, it is only available if a config flag is set. The following BCC script, for example, fails kmalloc() calls after a btrfs umount: from bcc import BPF prog = r""" BPF_HASH(flag); #include <linux/mm.h> int kprobe__btrfs_close_devices(void *ctx) { u64 key = 1; flag.update(&key, &key); return 0; } int kprobe__should_failslab(struct pt_regs *ctx) { u64 key = 1; u64 *res; res = flag.lookup(&key); if (res != 0) { bpf_override_return(ctx, -ENOMEM); } return 0; } """ b = BPF(text=prog) while 1: b.kprobe_poll() This patch refactors the should_failslab implementation so that the function is always available for error injection, independent of flags. This change would be similar in nature to commit f5490d3ec921 ("block: Add should_fail_bio() for bpf error injection"). Link: http://lkml.kernel.org/r/20180222020320.6944-1-hmclauchlan@fb.com Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: swap: unify cluster-based and vma-based swap readaheadMinchan Kim
This patch makes do_swap_page() not need to be aware of two different swap readahead algorithms. Just unify cluster-based and vma-based readahead function call. Link: http://lkml.kernel.org/r/1509520520-32367-3-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/20180220085249.151400-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: swap: clean up swap readaheadMinchan Kim
When I see recent change of swap readahead, I am very unhappy about current code structure which diverges two swap readahead algorithm in do_swap_page. This patch is to clean it up. Main motivation is that fault handler doesn't need to be aware of readahead algorithms but just should call swapin_readahead. As first step, this patch cleans up a little bit but not perfect (I just separate for review easier) so next patch will make the goal complete. [minchan@kernel.org: do not check readahead flag with THP anon] Link: http://lkml.kernel.org/r/874lm83zho.fsf@yhuang-dev.intel.com Link: http://lkml.kernel.org/r/20180227232611.169883-1-minchan@kernel.org Link: http://lkml.kernel.org/r/1509520520-32367-2-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/20180220085249.151400-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm/page_ref: use atomic_set_release in page_ref_unfreezeKonstantin Khlebnikov
page_ref_unfreeze() has exactly that semantic. No functional changes: just minus one barrier and proper handling of PPro errata. Link: http://lkml.kernel.org/r/151844393004.210639.4672319312617954272.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: hwpoison: disable memory error handling on 1GB hugepageNaoya Horiguchi
Recently the following BUG was reported: Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000 Memory failure: 0x3c0000: recovery action for huge page: Recovered BUG: unable to handle kernel paging request at ffff8dfcc0003000 IP: gup_pgd_range+0x1f0/0xc20 PGD 17ae72067 P4D 17ae72067 PUD 0 Oops: 0000 [#1] SMP PTI ... CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on a 1GB hugepage. This happens because get_user_pages_fast() is not aware of a migration entry on pud that was created in the 1st madvise() event. I think that conversion to pud-aligned migration entry is working, but other MM code walking over page table isn't prepared for it. We need some time and effort to make all this work properly, so this patch avoids the reported bug by just disabling error handling for 1GB hugepage. [n-horiguchi@ah.jp.nec.com: v2] Link: http://lkml.kernel.org/r/1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com Link: http://lkml.kernel.org/r/1517207283-15769-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm/memory_hotplug: optimize memory hotplugPavel Tatashin
During memory hotplugging we traverse struct pages three times: 1. memset(0) in sparse_add_one_section() 2. loop in __add_section() to set do: set_page_node(page, nid); and SetPageReserved(page); 3. loop in memmap_init_zone() to call __init_single_pfn() This patch removes the first two loops, and leaves only loop 3. All struct pages are initialized in one place, the same as it is done during boot. The benefits: - We improve memory hotplug performance because we are not evicting the cache several times and also reduce loop branching overhead. - Remove condition from hotpath in __init_single_pfn(), that was added in order to fix the problem that was reported by Bharata in the above email thread, thus also improve performance during normal boot. - Make memory hotplug more similar to the boot memory initialization path because we zero and initialize struct pages only in one function. - Simplifies memory hotplug struct page initialization code, and thus enables future improvements, such as multi-threading the initialization of struct pages in order to improve hotplug performance even further on larger machines. [pasha.tatashin@oracle.com: v5] Link: http://lkml.kernel.org/r/20180228030308.1116-7-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180215165920.8570-7-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Baoquan He <bhe@redhat.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm/memory_hotplug: don't read nid from struct page during hotplugPavel Tatashin
During memory hotplugging the probe routine will leave struct pages uninitialized, the same as it is currently done during boot. Therefore, we do not want to access the inside of struct pages before __init_single_page() is called during onlining. Because during hotplug we know that pages in one memory block belong to the same numa node, we can skip the checking. We should keep checking for the boot case. [pasha.tatashin@oracle.com: s/register_new_memory()/hotplug_memory_register()] Link: http://lkml.kernel.org/r/20180228030308.1116-6-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180215165920.8570-6-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: uninitialized struct page poisoning sanity checkingPavel Tatashin
During boot we poison struct page memory in order to ensure that no one is accessing this memory until the struct pages are initialized in __init_single_page(). This patch adds more scrutiny to this checking by making sure that flags do not equal the poison pattern when they are accessed. The pattern is all ones. Since node id is also stored in struct page, and may be accessed quite early, we add this enforcement into page_to_nid() function as well. Note, this is applicable only when NODE_NOT_IN_PAGE_FLAGS=n [pasha.tatashin@oracle.com: v4] Link: http://lkml.kernel.org/r/20180215165920.8570-4-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180213193159.14606-4-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Baoquan He <bhe@redhat.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: initialize pages on demand during bootPavel Tatashin
Deferred page initialization allows the boot cpu to initialize a small subset of the system's pages early in boot, with other cpus doing the rest later on. It is, however, problematic to know how many pages the kernel needs during boot. Different modules and kernel parameters may change the requirement, so the boot cpu either initializes too many pages or runs out of memory. To fix that, initialize early pages on demand. This ensures the kernel does the minimum amount of work to initialize pages during boot and leaves the rest to be divided in the multithreaded initialization path (deferred_init_memmap). The on-demand code is permanently disabled using static branching once deferred pages are initialized. After the static branch is changed to false, the overhead is up-to two branch-always instructions if the zone watermark check fails or if rmqueue fails. Sergey Senozhatsky noticed that while deferred pages currently make sense only on NUMA machines (we start one thread per latency node), CONFIG_NUMA is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT, so that is also must be addressed in the patch. [akpm@linux-foundation.org: fix typo in comment, make deferred_pages static] [pasha.tatashin@oracle.com: fix min() type mismatch warning] Link: http://lkml.kernel.org/r/20180212164543.26592-1-pasha.tatashin@oracle.com [pasha.tatashin@oracle.com: use zone_to_nid() in deferred_grow_zone()] Link: http://lkml.kernel.org/r/20180214163343.21234-2-pasha.tatashin@oracle.com [pasha.tatashin@oracle.com: might_sleep warning] Link: http://lkml.kernel.org/r/20180306192022.28289-1-pasha.tatashin@oracle.com [akpm@linux-foundation.org: s/spin_lock/spin_lock_irq/ in page_alloc_init_late()] [pasha.tatashin@oracle.com: v5] Link: http://lkml.kernel.org/r/20180309220807.24961-3-pasha.tatashin@oracle.com [akpm@linux-foundation.org: tweak comments] [pasha.tatashin@oracle.com: v6] Link: http://lkml.kernel.org/r/20180313182355.17669-3-pasha.tatashin@oracle.com [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20180209192216.20509-2-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm: disable interrupts while initializing deferred pagesPavel Tatashin
Vlastimil Babka reported about a window issue during which when deferred pages are initialized, and the current version of on-demand initialization is finished, allocations may fail. While this is highly unlikely scenario, since this kind of allocation request must be large, and must come from interrupt handler, we still want to cover it. We solve this by initializing deferred pages with interrupts disabled, and holding node_size_lock spin lock while pages in the node are being initialized. The on-demand deferred page initialization that comes later will use the same lock, and thus synchronize with deferred_init_memmap(). It is unlikely for threads that initialize deferred pages to be interrupted. They run soon after smp_init(), but before modules are initialized, and long before user space programs. This is why there is no adverse effect of having these threads running with interrupts disabled. [pasha.tatashin@oracle.com: v6] Link: http://lkml.kernel.org/r/20180313182355.17669-2-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180309220807.24961-2-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGEAnshuman Khandual
alloc_contig_range() initiates compaction and eventual migration for the purpose of either CMA or HugeTLB allocations. At present, the reason code remains the same MR_CMA for either of these cases. Let's make it MR_CONTIG_RANGE which will appropriately reflect the reason code in both these cases. Link: http://lkml.kernel.org/r/20180202091518.18798-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make struct kmem_cache_order_objects::x unsigned intAlexey Dobriyan
struct kmem_cache_order_objects is for mixing order and number of objects, and orders aren't big enough to warrant 64-bit width. Propagate unsignedness down so that everything fits. !!! Patch assumes that "PAGE_SIZE << order" doesn't overflow. !!! Link: http://lkml.kernel.org/r/20180305200730.15812-23-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slab: make usercopy region 32-bitAlexey Dobriyan
If kmem case sizes are 32-bit, then usecopy region should be too. Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05kasan: make kasan_cache_create() work with 32-bit slab cache sizesAlexey Dobriyan
If SLAB doesn't support 4GB+ kmem caches (it never did), KASAN should not do it as well. Link: http://lkml.kernel.org/r/20180305200730.15812-20-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->size unsigned intAlexey Dobriyan
Linux doesn't support negative length objects (including meta data). Link: http://lkml.kernel.org/r/20180305200730.15812-18-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->object_size unsigned intAlexey Dobriyan
Linux doesn't support negative length objects. Link: http://lkml.kernel.org/r/20180305200730.15812-17-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->offset unsigned intAlexey Dobriyan
->offset is free pointer offset from the start of the object, can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-16-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->cpu_partial unsigned intAlexey Dobriyan
/* * cpu_partial determined the maximum number of objects * kept in the per cpu partial lists of a processor. */ Can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->inuse unsigned intAlexey Dobriyan
->inuse is "the number of bytes in actual use by the object", can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-14-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->align unsigned intAlexey Dobriyan
Kmem cache alignment can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-13-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->reserved unsigned intAlexey Dobriyan
->reserved is either 0 or sizeof(struct rcu_head), can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-12-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->red_left_pad unsigned intAlexey Dobriyan
Padding length can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-11-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->max_attr_size unsigned intAlexey Dobriyan
->max_attr_size is maximum length of every SLAB memcg attribute ever written. VFS limits those to INT_MAX. Link: http://lkml.kernel.org/r/20180305200730.15812-10-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slub: make ->remote_node_defrag_ratio unsigned intAlexey Dobriyan
->remote_node_defrag_ratio is in range 0..1000. This also adds a check and modifies the behavior to return an error code. Before this patch invalid values were ignored. Link: http://lkml.kernel.org/r/20180305200730.15812-9-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slab: make kmem_cache_create() work with 32-bit sizesAlexey Dobriyan
struct kmem_cache::size and ::align were always 32-bit. Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0. kmem_cache_create(1UL<<32+1) created 1-byte cache as expected. size_t doesn't work and never did. Link: http://lkml.kernel.org/r/20180305200730.15812-6-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slab: make kmalloc_size() return "unsigned int"Alexey Dobriyan
kmalloc_size() derives size of kmalloc cache from internal index, which can't be negative. Propagate unsignedness a bit. Link: http://lkml.kernel.org/r/20180305200730.15812-3-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05slab: make kmalloc_index() return "unsigned int"Alexey Dobriyan
kmalloc_index() return index into an array of kmalloc kmem caches, therefore should be unsigned. Space savings with SLUB on trimmed down .config: add/remove: 0/1 grow/shrink: 6/56 up/down: 85/-557 (-472) Function old new delta calculate_sizes 924 983 +59 on_freelist 589 604 +15 init_cache_random_seq 122 127 +5 ext4_mb_init 1206 1210 +4 slab_pad_check.part 270 271 +1 cpu_partial_store 112 113 +1 usersize_show 28 27 -1 ... new_slab 1871 1837 -34 slab_order 204 - -204 This patch start a series of converting SLUB (mostly) to "unsigned int". 1) Most integers in the code are in fact unsigned entities: array indexes, lengths, buffer sizes, allocation orders. It is therefore better to use unsigned variables 2) Some integers in the code are either "size_t" or "unsigned long" for no reason. size_t usually comes from people trying to maintain type correctness and figuring out that "sizeof" operator returns size_t or memset/memcpy takes size_t so should everything passed to it. However the number of 4GB+ objects in the kernel is very small. Most, if not all, dynamically allocated objects with kmalloc() or kmem_cache_create() aren't actually big. Maintaining wide types doesn't do anything. 64-bit ops are bigger than 32-bit on our beloved x86_64, so try to not use 64-bit where it isn't necessary (read: everywhere where integers are integers not pointers) 3) in case of SLAB allocators, there are additional limitations *) page->inuse, page->objects are only 16-/15-bit, *) cache size was always 32-bit *) slab orders are small, order 20 is needed to go 64-bit on x86_64 (PAGE_SIZE << order) Basically everything is 32-bit except kmalloc(1ULL<<32) which gets shortcut through page allocator. Christoph said: : : That changes with large base page size on power and ARM64 f.e. but then : we do not want to encourage larger allocations through slab anyways. Link: http://lkml.kernel.org/r/20180305200730.15812-2-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05Merge tag 'armsoc-drivers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "The main addition this time around is the new ARM "SCMI" framework, which is the latest in a series of standards coming from ARM to do power management in a platform independent way. This has been through many review cycles, and it relies on a rather interesting way of using the mailbox subsystem, but in the end I agreed that Sudeep's version was the best we could do after all. Other changes include: - the ARM CCN driver is moved out of drivers/bus into drivers/perf, which makes more sense. Similarly, the performance monitoring portion of the CCI driver are moved the same way and cleaned up a little more. - a series of updates to the SCPI framework - support for the Mediatek mt7623a SoC in drivers/soc - support for additional NVIDIA Tegra hardware in drivers/soc - a new reset driver for Socionext Uniphier - lesser bug fixes in drivers/soc, drivers/tee, drivers/memory, and drivers/firmware and drivers/reset across platforms" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (87 commits) reset: uniphier: add ethernet reset control support for PXs3 reset: stm32mp1: Enable stm32mp1 reset driver dt-bindings: reset: add STM32MP1 resets reset: uniphier: add Pro4/Pro5/PXs2 audio systems reset control reset: imx7: add 'depends on HAS_IOMEM' to fix unmet dependency reset: modify the way reset lookup works for board files reset: add support for non-DT systems clk: scmi: use devm_of_clk_add_hw_provider() API and drop scmi_clocks_remove firmware: arm_scmi: prevent accessing rate_discrete uninitialized hwmon: (scmi) return -EINVAL when sensor information is unavailable amlogic: meson-gx-socinfo: Update soc ids soc/tegra: pmc: Use the new reset APIs to manage reset controllers soc: mediatek: update power domain data of MT2712 dt-bindings: soc: update MT2712 power dt-bindings cpufreq: scmi: add thermal dependency soc: mediatek: fix the mistaken pointer accessed when subdomains are added soc: mediatek: add SCPSYS power domain driver for MediaTek MT7623A SoC soc: mediatek: avoid hardcoded value with bus_prot_mask dt-bindings: soc: add header files required for MT7623A SCPSYS dt-binding dt-bindings: soc: add SCPSYS binding for MT7623 and MT7623A SoC ...
2018-04-05Merge tag 'armsoc-soc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform updates from Arnd Bergmann: "This release brings up a new platform based on the old ARM9 core: the Nuvoton NPCM is used as a baseboard management controller, competing with the better known ASpeed AST2xx series. Another important change is the addition of ARMv7-A based chips in mach-stm32. The older parts in this platform are ARMv7-M based microcontrollers, now they are expanding to general-purpose workloads. The other changes are the usual defconfig updates to enable additional drivers, lesser bugfixes. The largest updates as often are the ongoing OMAP cleanups, but we also have a number of changes for the older PXA and davinci platforms this time. For the Renesas shmobile/r-car platform, some new infrastructure is needed to make the watchdog work correctly. Supporting Multiprocessing on Allwinner A80 required a significant amount of new code, but is not doing anything unexpected" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (179 commits) arm: npcm: modify configuration for the NPCM7xx BMC. MAINTAINERS: update entry for ARM/berlin ARM: omap2: fix am43xx build without L2X0 ARM: davinci: da8xx: simplify CFGCHIP regmap_config ARM: davinci: da8xx: fix oops in USB PHY driver due to stack allocated platform_data ARM: multi_v7_defconfig: add NXP FlexCAN IP support ARM: multi_v7_defconfig: enable thermal driver for i.MX devices ARM: multi_v7_defconfig: add RN5T618 PMIC family support ARM: multi_v7_defconfig: add NXP graphics drivers ARM: multi_v7_defconfig: add GPMI NAND controller support ARM: multi_v7_defconfig: add OCOTP driver for NXP SoCs ARM: multi_v7_defconfig: configure I2C driver built-in arm64: defconfig: add CONFIG_UNIPHIER_THERMAL and CONFIG_SNI_AVE ARM: imx: fix imx6sll-only build ARM: imx: select ARM_CPU_SUSPEND for CPU_IDLE as well ARM: mxs_defconfig: Re-sync defconfig ARM: imx_v4_v5_defconfig: Use the generic fsl-asoc-card driver ARM: imx_v4_v5_defconfig: Re-sync defconfig arm64: defconfig: enable stmmac ethernet to defconfig ARM: EXYNOS: Simplify code in coupled CPU idle hot path ...
2018-04-05Merge tag 'devicetree-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: - Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch more warnings (hidden behind W=1). - Build dtc lexer and parser files instead of using shipped versions. - Rework overlay apply API to take an FDT as input and apply overlays in a single step. - Add a phandle lookup cache. This improves boot time by hundreds of msec on systems with large DT. - Add trivial mcp4017/18/19 potentiometers bindings. - Remove VLA stack usage in DT code. * tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits) of: unittest: fix an error code in of_unittest_apply_overlay() of: unittest: move misplaced function declaration of: unittest: Remove VLA stack usage of: overlay: Fix forgotten reference to of_overlay_apply() of: Documentation: Fix forgotten reference to of_overlay_apply() of: unittest: local return value variable related cleanups of: unittest: remove unneeded local return value variables dt-bindings: trivial: add various mcp4017/18/19 potentiometers of: unittest: fix an error test in of_unittest_overlay_8() of: cache phandle nodes to reduce cost of of_find_node_by_phandle() dt-bindings: rockchip-dw-mshc: use consistent clock names MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems scripts: turn off some new dtc warnings by default scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987 scripts/dtc: generate lexer and parser during build instead of shipping powerpc: boot: add strrchr function of: overlay: do not include path in full_name of added nodes of: unittest: clean up changeset test arm64/efi: Make strrchr() available to the EFI namespace ARM: boot: add strrchr function ...
2018-04-05Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "The work on cleaning up and getting the bugs out of siginfo generation was largely stalled this round. The progress that was made was the definition of FPE_FLTUNK. Which is usable to fix many of the cases where siginfo generation is erroneously generating SI_USER by setting si_code to 0, that has recently been tagged as FPE_FIXME. You already have the change by way of the arm64 tree as that definition was pulled into the arm64 tree to allow fixing the problem there. What remains is the second round of fixing for what I thought was a trivial change to the struct siginfo when put the union in _sigfault where it belongs. Do to historical reasons 32bit m68k only ensures that pointers are 2 byte aligned. So I have added a m68k test case made of BUILD_BUG_ONs to verify I have this fix correct and possibly catch problems, and I have computed the number of bytes of padding needed for the _addr_bnd and _addr_pkey cases and just use an array of characters that size. For pure paranoia I have written the code so if there is an architecture out there that does not perform any alignment of structures it should still work. With the removal of all of the stale arechitectures this cycle future work on cleaning up struct siginfo should be much easier. Almost all of the conflicting si_code definitions have been removed with the removal of (blackfin, tile, and frv). Plus some of the most difficult to test cases have simply been removed from the tree. Which means that with a little luck copy_siginfo_to_user can become a light weight wrapper around copy_to_user in the next cycle" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: m68k: Verify the offsets in struct siginfo never change. signal: Correct the offset of si_pkey and si_lower in struct siginfo on m68k
2018-04-05Merge branch 'for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add info about loaded kdump kernel into the dump stack header - Move dump-stack related code from printk.c to lib/dump_stack.c - Write message about suspending consoles in KERN_INFO log level * 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: change message to pr_info printk: move dump stack related code to lib/dump_stack.c print kdump kernel loaded status in stack dump
2018-04-05Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem updates from Jan Kara: "udf, ext2, quota, fsnotify fixes & cleanups: - udf fixes for handling of media without uid/gid - udf fixes for some corner cases in parsing of volume recognition sequence - improvements of fsnotify handling of ENOMEM - new ioctl to allow setting of watch descriptor id for inotify (for checkpoint - restart) - small ext2, reiserfs, quota cleanups" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Kill an unused extern entry form quota.h reiserfs: Remove VLA from fs/reiserfs/reiserfs.h udf: fix potential refcnt problem of nls module ext2: change return code to -ENOMEM when failing memory allocation udf: Do not mark possibly inconsistent filesystems as closed fsnotify: Let userspace know about lost events due to ENOMEM fanotify: Avoid lost events due to ENOMEM for unlimited queues udf: Remove never implemented mount options udf: Update mount option documentation udf: Provide saner default for invalid uid / gid udf: Clean up handling of invalid uid/gid udf: Apply uid/gid mount options also to new inodes & chown udf: Ignore [ug]id=ignore mount options udf: Fix handling of Partition Descriptors udf: Unify common handling of descriptors udf: Convert descriptor index definitions to enum udf: Allow volume descriptor sequence to be terminated by unrecorded block udf: Simplify handling of Volume Descriptor Pointers udf: Fix off-by-one in volume descriptor sequence length inotify: Extend ioctl to allow to request id of new watch descriptor
2018-04-05Merge tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on server xdr decoding (with an eye towards eliminating a data copy in the RDMA case). I did some refactoring of the delegation code in preparation for eliminating some delegation self-conflicts and implementing write delegations" * tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits) nfsd: fix incorrect umasks sunrpc: remove incorrect HMAC request initialization NFSD: Clean up legacy NFS SYMLINK argument XDR decoders NFSD: Clean up legacy NFS WRITE argument XDR decoders nfsd: Trace NFSv4 COMPOUND execution nfsd: Add I/O trace points in the NFSv4 read proc nfsd: Add I/O trace points in the NFSv4 write path nfsd: Add "nfsd_" to trace point names nfsd: Record request byte count, not count of vectors nfsd: Fix NFSD trace points svc: Report xprt dequeue latency sunrpc: Report per-RPC execution stats sunrpc: Re-purpose trace_svc_process sunrpc: Save remote presentation address in svc_xprt for trace events sunrpc: Simplify trace_svc_recv sunrpc: Simplify do_enqueue tracing sunrpc: Move trace_svc_xprt_dequeue() sunrpc: Update show_svc_xprt_flags() to include recently added flags svc: Simplify ->xpo_secure_port sunrpc: Remove unneeded pointer dereference ...
2018-04-05Merge tag 'f2fs-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this round, we've mainly focused on performance tuning and critical bug fixes occurred in low-end devices. Sheng Yong introduced lost_found feature to keep missing files during recovery instead of thrashing them. We're preparing coming fsverity implementation. And, we've got more features to communicate with users for better performance. In low-end devices, some memory-related issues were fixed, and subtle race condtions and corner cases were addressed as well. Enhancements: - large nat bitmaps for more free node ids - add three block allocation policies to pass down write hints given by user - expose extension list to user and introduce hot file extension - tune small devices seamlessly for low-end devices - set readdir_ra by default - give more resources under gc_urgent mode regarding to discard and cleaning - introduce fsync_mode to enforce posix or not - nowait aio support - add lost_found feature to keep dangling inodes - reserve bits for future fsverity feature - add test_dummy_encryption for FBE Bug fixes: - don't use highmem for dentry pages - align memory boundary for bitops - truncate preallocated blocks in write errors - guarantee i_times on fsync call - clear CP_TRIMMED_FLAG correctly - prevent node chain loop during recovery - avoid data race between atomic write and background cleaning - avoid unnecessary selinux violation warnings on resgid option - GFP_NOFS to avoid deadlock in quota and read paths - fix f2fs_skip_inode_update to allow i_size recovery In addition to the above, there are several minor bug fixes and clean-ups" * tag 'f2fs-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits) f2fs: remain written times to update inode during fsync f2fs: make assignment of t->dentry_bitmap more readable f2fs: truncate preallocated blocks in error case f2fs: fix a wrong condition in f2fs_skip_inode_update f2fs: reserve bits for fs-verity f2fs: Add a segment type check in inplace write f2fs: no need to initialize zero value for GFP_F2FS_ZERO f2fs: don't track new nat entry in nat set f2fs: clean up with F2FS_BLK_ALIGN f2fs: check blkaddr more accuratly before issue a bio f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read f2fs: introduce a new mount option test_dummy_encryption f2fs: introduce F2FS_FEATURE_LOST_FOUND feature f2fs: release locks before return in f2fs_ioc_gc_range() f2fs: align memory boundary for bitops f2fs: remove unneeded set_cold_node() f2fs: add nowait aio support f2fs: wrap all options with f2fs_sb_info.mount_opt f2fs: Don't overwrite all types of node to keep node chain f2fs: introduce mount option for fsync mode ...
2018-04-05Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc, ufs, mpt3sas, hisi_sas. In addition we have removed several really old drivers: sym53c416, NCR53c406a, fdomain, fdomain_cs and removed the old scsi_module.c initialization from all remaining drivers. Plus an assortment of bug fixes, initialization errors and other minor fixes" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (168 commits) scsi: ufs: Add support for Auto-Hibernate Idle Timer scsi: ufs: sysfs: reworking of the rpm_lvl and spm_lvl entries scsi: qla2xxx: fx00 copypaste typo scsi: qla2xxx: fix error message on <qla2400 scsi: smartpqi: update driver version scsi: smartpqi: workaround fw bug for oq deletion scsi: arcmsr: Change driver version to v1.40.00.05-20180309 scsi: arcmsr: Sleep to avoid CPU stuck too long for waiting adapter ready scsi: arcmsr: Handle adapter removed due to thunderbolt cable disconnection. scsi: arcmsr: Rename ACB_F_BUS_HANG_ON to ACB_F_ADAPTER_REMOVED for adapter hot-plug scsi: qla2xxx: Update driver version to 10.00.00.06-k scsi: qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan scsi: qla2xxx: Cleanup code to improve FC-NVMe error handling scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset scsi: qla2xxx: Fix retry for PRLI RJT with reason of BUSY scsi: qla2xxx: Remove nvme_done_list scsi: qla2xxx: Return busy if rport going away scsi: qla2xxx: Fix n2n_ae flag to prevent dev_loss on PDB change scsi: qla2xxx: Add FC-NVMe abort processing scsi: qla2xxx: Add changes for devloss timeout in driver ...
2018-04-05Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
2018-04-05Merge tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: "Noteworthy is the NVDIMM support: - NVDIMM support to EDAC (Tony Luck) - misc fixes" * tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, sb_edac: Remove variable length array usage EDAC, skx_edac: Detect non-volatile DIMMs firmware, DMI: Add function to look up a handle and return DIMM size acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle EDAC: Add new memory type for non-volatile DIMMs EDAC: Drop duplicated array of strings for memory type names EDAC, layerscape: Allow building for LS1021A
2018-04-05kernel.h: Retain constant expression output for max()/min()Kees Cook
In the effort to remove all VLAs from the kernel[1], it is desirable to build with -Wvla. However, this warning is overly pessimistic, in that it is only happy with stack array sizes that are declared as constant expressions, and not constant values. One case of this is the evaluation of the max() macro which, due to its construction, ends up converting constant expression arguments into a constant value result. All attempts to rewrite this macro with __builtin_constant_p() failed with older compilers (e.g. gcc 4.4)[2]. However, Martin Uecker, constructed[3] a mind-shattering solution that works everywhere. Cthulhu fhtagn! This patch updates the min()/max() macros to evaluate to a constant expression when called on constant expression arguments. This removes several false-positive stack VLA warnings from an x86 allmodconfig build when -Wvla is added: $ diff -u before.txt after.txt | grep ^- -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids variable length array ‘ids’ [-Wvla] -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array ‘namebuf’ [-Wvla] -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ [-Wvla] -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla] -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla] -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ [-Wvla] This also updates two cases where different enums were being compared and explicitly casts them to int (which matches the old side-effect of the single-evaluation code): one in tpm/tpm_tis_core.h, and one in drm/drm_color_mgmt.c. [1] https://lkml.org/lkml/2018/3/7/621 [2] https://lkml.org/lkml/2018/3/10/170 [3] https://lkml.org/lkml/2018/3/20/845 Co-Developed-by: Linus Torvalds <torvalds@linux-foundation.org> Co-Developed-by: Martin Uecker <Martin.Uecker@med.uni-goettingen.de> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - new driver for PhoenixRC Flight Controller Adapter - new driver for RAVE SP Power button - fixes for autosuspend-related deadlocks in a few unput USB dirvers - support for 2nd wheel in ATech PS/2 mouse - fix for ALPS trackpoint detection on Thinkpad L570 and Latitude 7370 - bunch of cleanups in various in PS/2 protocols - other assorted changes and fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (35 commits) Input: i8042 - enable MUX on Sony VAIO VGN-CS series to fix touchpad Input: stmfts, s6sy761 - update my e-mail Input: stmfts - use async probe & suspend/resume to avoid 2s delay Input: ALPS - fix TrackStick detection on Thinkpad L570 and Latitude 7370 Input: xpad - add PDP device id 0x02a4 Input: alps - report pressure of v3 and v7 trackstick Input: pxrc - new driver for PhoenixRC Flight Controller Adapter Input: usbtouchscreen - do not rely on input_dev->users Input: usbtouchscreen - fix deadlock in autosuspend Input: pegasus_notetaker - do not rely on input_dev->users Input: pagasus_notetaker - fix deadlock in autosuspend Input: synaptics_usb - do not rely on input_dev->users Input: synaptics_usb - fix deadlock in autosuspend Input: gpio-keys - add support for wakeup event action Input: appletouch - use true and false for boolean values Input: silead - add Chuwi Hi8 support Input: analog - use get_cycles() on PPC Input: stmpe-keypad - remove VLA usage Input: i8042 - add Lenovo ThinkPad L460 to i8042 reset list Input: add RAVE SP Powerbutton driver ...
2018-04-05net/mlx5: Mkey creation command adjustmentsAriel Levkovich
This change updates the mlx5 interface to create mkey on the device. The updates in the command mailbox include increasing the access mode type field to 5 bits in order to support additional types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device memory access type and will be used when registering MR on allocated device memory. All the places that use the old access mode format are adjusted as well. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>