summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-04-07mm: use zone and order instead of free area in free_list manipulatorsAlexander Duyck
In order to enable the use of the zone from the list manipulator functions I will need access to the zone pointer. As it turns out most of the accessors were always just being directly passed &zone->free_area[order] anyway so it would make sense to just fold that into the function itself and pass the zone and order as arguments instead of the free area. In order to be able to reference the zone we need to move the declaration of the functions down so that we have the zone defined before we define the list manipulation functions. Since the functions are only used in the file mm/page_alloc.c we can just move them there to reduce noise in the header. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Yang Zhang <yang.zhang.wz@gmail.com> Cc: wei qi <weiqi4@huawei.com> Link: http://lkml.kernel.org/r/20200211224613.29318.43080.stgit@localhost.localdomain Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: adjust shuffle code to allow for future coalescingAlexander Duyck
Patch series "mm / virtio: Provide support for free page reporting", v17. This series provides an asynchronous means of reporting free guest pages to a hypervisor so that the memory associated with those pages can be dropped and reused by other processes and/or guests on the host. Using this it is possible to avoid unnecessary I/O to disk and greatly improve performance in the case of memory overcommit on the host. When enabled we will be performing a scan of free memory every 2 seconds while pages of sufficiently high order are being freed. In each pass at least one sixteenth of each free list will be reported. By doing this we avoid racing against other threads that may be causing a high amount of memory churn. The lowest page order currently scanned when reporting pages is pageblock_order so that this feature will not interfere with the use of Transparent Huge Pages in the case of virtualization. Currently this is only in use by virtio-balloon however there is the hope that at some point in the future other hypervisors might be able to make use of it. In the virtio-balloon/QEMU implementation the hypervisor is currently using MADV_DONTNEED to indicate to the host kernel that the page is currently free. It will be zeroed and faulted back into the guest the next time the page is accessed. To track if a page is reported or not the Uptodate flag was repurposed and used as a Reported flag for Buddy pages. We walk though the free list isolating pages and adding them to the scatterlist until we either encounter the end of the list or have processed at least one sixteenth of the pages that were listed in nr_free prior to us starting. If we fill the scatterlist before we reach the end of the list we rotate the list so that the first unreported page we encounter is moved to the head of the list as that is where we will resume after we have freed the reported pages back into the tail of the list. Below are the results from various benchmarks. I primarily focused on two tests. The first is the will-it-scale/page_fault2 test, and the other is a modified version of will-it-scale/page_fault1 that was enabled to use THP. I did this as it allows for better visibility into different parts of the memory subsystem. The guest is running with 32G for RAM on one node of a E5-2630 v3. The host has had some features such as CPU turbo disabled in the BIOS. Test page_fault1 (THP) page_fault2 Name tasks Process Iter STDEV Process Iter STDEV Baseline 1 1012402.50 0.14% 361855.25 0.81% 16 8827457.25 0.09% 3282347.00 0.34% Patches Applied 1 1007897.00 0.23% 361887.00 0.26% 16 8784741.75 0.39% 3240669.25 0.48% Patches Enabled 1 1010227.50 0.39% 359749.25 0.56% 16 8756219.00 0.24% 3226608.75 0.97% Patches Enabled 1 1050982.00 4.26% 357966.25 0.14% page shuffle 16 8672601.25 0.49% 3223177.75 0.40% Patches enabled 1 1003238.00 0.22% 360211.00 0.22% shuffle w/ RFC 16 8767010.50 0.32% 3199874.00 0.71% The results above are for a baseline with a linux-next-20191219 kernel, that kernel with this patch set applied but page reporting disabled in virtio-balloon, the patches applied and page reporting fully enabled, the patches enabled with page shuffling enabled, and the patches applied with page shuffling enabled and an RFC patch that makes used of MADV_FREE in QEMU. These results include the deviation seen between the average value reported here versus the high and/or low value. I observed that during the test memory usage for the first three tests never dropped whereas with the patches fully enabled the VM would drop to using only a few GB of the host's memory when switching from memhog to page fault tests. Any of the overhead visible with this patch set enabled seems due to page faults caused by accessing the reported pages and the host zeroing the page before giving it back to the guest. This overhead is much more visible when using THP than with standard 4K pages. In addition page shuffling seemed to increase the amount of faults generated due to an increase in memory churn. The overehad is reduced when using MADV_FREE as we can avoid the extra zeroing of the pages when they are reintroduced to the host, as can be seen when the RFC is applied with shuffling enabled. The overall guest size is kept fairly small to only a few GB while the test is running. If the host memory were oversubscribed this patch set should result in a performance improvement as swapping memory in the host can be avoided. A brief history on the background of free page reporting can be found at: https://lore.kernel.org/lkml/29f43d5796feed0dec8e8bb98b187d9dac03b900.camel@linux.intel.com/ This patch (of 9): Move the head/tail adding logic out of the shuffle code and into the __free_one_page function since ultimately that is where it is really needed anyway. By doing this we should be able to reduce the overhead and can consolidate all of the list addition bits in one spot. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: David Hildenbrand <david@redhat.com> Cc: Yang Zhang <yang.zhang.wz@gmail.com> Cc: Pankaj Gupta <pagupta@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: wei qi <weiqi4@huawei.com> Link: http://lkml.kernel.org/r/20200211224602.29318.84523.stgit@localhost.localdomain Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: code cleanup for MADV_FREEHuang Ying
Some comments for MADV_FREE is revised and added to help people understand the MADV_FREE code, especially the page flag, PG_swapbacked. This makes page_is_file_cache() isn't consistent with its comments. So the function is renamed to page_is_file_lru() to make them consistent again. All these are put in one patch as one logical change. Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@kernel.org> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200317100342.2730705-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/ksm.c: update get_user_pages() argument in commentLi Chen
This updates get_user_pages()'s argument in ksm_test_exit()'s comment Signed-off-by: Li Chen <chenli@uniontech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/30ac2417-f1c7-f337-0beb-df561295298c@uniontech.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: remove CONFIG_TRANSPARENT_HUGE_PAGECACHEMatthew Wilcox (Oracle)
Commit e496cf3d7821 ("thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE") notes that it should be reverted when the PowerPC problem was fixed. The commit fixing the PowerPC problem (953c66c2b22a) did not revert the commit; instead setting CONFIG_TRANSPARENT_HUGE_PAGECACHE to the same as CONFIG_TRANSPARENT_HUGEPAGE. Checking with Kirill and Aneesh, this was an oversight, so remove the Kconfig symbol and undo the work of commit e496cf3d7821. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/20200318140253.6141-6-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07include/linux/pagemap.h: optimise find_subpage for !THPMatthew Wilcox (Oracle)
If THP is disabled, find_subpage() can become a no-op by using hpage_nr_pages() instead of compound_nr(). hpage_nr_pages() embeds a check for PageTail, so we can drop the check here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/20200318140253.6141-5-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm, thp: track fallbacks due to failed memcg charges separatelyDavid Rientjes
The thp_fault_fallback and thp_file_fallback vmstats are incremented if either the hugepage allocation fails through the page allocator or the hugepage charge fails through mem cgroup. This patch leaves this field untouched but adds two new fields, thp_{fault,file}_fallback_charge, which is incremented only when the mem cgroup charge fails. This distinguishes between attempted hugepage allocations that fail due to fragmentation (or low memory conditions) and those that fail due to mem cgroup limits. That can be used to determine the impact of fragmentation on the system by excluding faults that failed due to memcg usage. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jeremy Cline <jcline@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2003061422070.7412@chino.kir.corp.google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm, shmem: add vmstat for hugepage fallbackDavid Rientjes
The existing thp_fault_fallback indicates when thp attempts to allocate a hugepage but fails, or if the hugepage cannot be charged to the mem cgroup hierarchy. Extend this to shmem as well. Adds a new thp_file_fallback to complement thp_file_alloc that gets incremented when a hugepage is attempted to be allocated but fails, or if it cannot be charged to the mem cgroup hierarchy. Additionally, remove the check for CONFIG_TRANSPARENT_HUGE_PAGECACHE from shmem_alloc_hugepage() since it is only called with this configuration option. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jeremy Cline <jcline@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2003061421240.7412@chino.kir.corp.google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/migrate.c: migrate PG_readahead flagYang Shi
Currently the migration code doesn't migrate PG_readahead flag. Theoretically this would incur slight performance loss as the application might have to ramp its readahead back up again. Even though such problem happens, it might be hidden by something else since migration is typically triggered by compaction and NUMA balancing, any of which should be more noticeable. Migrate the flag after end_page_writeback() since it may clear PG_reclaim flag, which is the same bit as PG_readahead, for the new page. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Link: http://lkml.kernel.org/r/1581640185-95731-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/migrate.c: unify "not queued for migration" handling in do_pages_move()Wei Yang
It can currently happen that we store the status of a page twice: * Once we detect that it is already on the target node * Once we moved a bunch of pages, and a page that's already on the target node is contained in the current interval. Let's simplify the code and always call do_move_pages_to_node() in case we did not queue a page for migration. Note that pages that are already on the target node are not added to the pagelist and are, therefore, ignored by do_move_pages_to_node() - there is no functional change. The status of such a page is now only stored once. [david@redhat.com rephrase changelog] Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200214003017.25558-5-richardw.yang@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/migrate.c: check pagelist in move_pages_and_store_status()Wei Yang
When pagelist is empty, it is not necessary to do the move and store. Also it consolidate the empty list check in one place. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200214003017.25558-4-richardw.yang@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/migrate.c: wrap do_move_pages_to_node() and store_status()Wei Yang
Usually, do_move_pages_to_node() and store_status() are used in combination. We have three similar call sites. Let's provide a wrapper for both function calls - move_pages_and_store_status - to make the calling code easier to maintain and fix (as noted by Yang Shi, the return value handling of do_move_pages_to_node() has a flaw). [david@redhat.com rephrase changelog] Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200214003017.25558-3-richardw.yang@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/migrate.c: no need to check for i > start in do_pages_move()Wei Yang
Patch series "cleanup on do_pages_move()", v5. The logic in do_pages_move() is a little mess for audience to read and has some potential error on handling the return value. Especially there are three calls on do_move_pages_to_node() and store_status() with almost the same form. This patch set tries to make the code a little friendly for audience by consolidate the calls. This patch (of 4): At this point, we always have i >= start. If i == start, store_status() will return 0. So we can drop the check for i > start. [david@redhat.com rephrase changelog] Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200214003017.25558-2-richardw.yang@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: make it clear that gfp reclaim modifiers are valid only for sleepable ↵Michal Hocko
allocations While it might be really clear to MM developers that gfp reclaim modifiers are applicable only to sleepable allocations (those with __GFP_DIRECT_RECLAIM) it seems that actual users of the API are not always sure. Make it explicit that they are not applicable for GFP_NOWAIT or GFP_ATOMIC allocations which are the most commonly used non-sleepable allocation masks. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Neil Brown <neilb@suse.de> Link: http://lkml.kernel.org/r/20200403083543.11552-3-mhocko@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vmalloc: fix a typo in commentQiujun Huang
There is a typo in comment, fix it. "exeeds" -> "exceeds" Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200404060136.10838-1-hqjagain@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vma: append unlikely() while testing VMA access permissionsAnshuman Khandual
It is unlikely that an inaccessible VMA without required permission flags will get a page fault. Hence lets just append unlikely() directive to such checks in order to improve performance while also standardizing it across various platforms. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Link: http://lkml.kernel.org/r/1582525304-32113-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vma: replace all remaining open encodings with vma_is_anonymous()Anshuman Khandual
This replaces all remaining open encodings with vma_is_anonymous(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vma: replace all remaining open encodings with is_vm_hugetlb_page()Anshuman Khandual
This replaces all remaining open encodings with is_vm_hugetlb_page(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vma: make vma_is_accessible() available for general useAnshuman Khandual
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it available for general use. While here, this replaces all remaining open encodings for VMA access check with vma_is_accessible(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vma: add missing VMA flag readable name for VM_SYNCAnshuman Khandual
Patch series "mm/vma: Use all available wrappers when possible", v2. Apart from adding a VMA flag readable name for trace purpose, this series does some open encoding replacements with availabe VMA specific wrappers. This skips VM_HUGETLB check in vma_migratable() as its already being done with another patch (https://patchwork.kernel.org/patch/11347831/) which is yet to be merged. This patch (of 4): This just adds the missing readable name for VM_SYNC. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: set vm_next and vm_prev to NULL in vm_area_dup()Li Xinhai
Set ->vm_next and ->vm_prev to NULL to prevent potential misuse from the new duplicated vma. Currently, only in fork path there are misuse for handling anon_vma. No other bugs been revealed with this patch applied. Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1581150928-3214-4-git-send-email-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07Revert "mm/rmap.c: reuse mergeable anon_vma as parent when fork"Li Xinhai
This reverts commit 4e4a9eb921332b9d1 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork"). In dup_mmap(), anon_vma_fork() is called for attaching anon_vma and parameter 'tmp' (i.e., the new vma of child) has same ->vm_next and ->vm_prev as its parent vma. That causes the anon_vma used by parent been mistakenly shared by child (In anon_vma_clone(), the code added by that commit will do this reuse work). Besides this issue, the design of reusing anon_vma from vma which has gone through fork should be avoided ([1]). So, this patch reverts that commit and maintains the consistent logic of reusing anon_vma for fork/split/merge vma. Reusing anon_vma within the process is fine. But if a vma has gone through fork(), then that vma's anon_vma should not be shared with its neighbor vma. As explained in [1], when vma gone through fork(), the check for list_is_singular(vma->anon_vma_chain) will be false, and don't share anon_vma. With current issue, one example can clarify more. Parent process do below two steps: 1. p_vma_1 is created and p_anon_vma_1 is prepared; 2. p_vma_2 is created and share p_anon_vma_1; (this is allowed, becaues p_vma_1 didn't gothrough fork()); parent process do fork(): 3. c_vma_1 is dup from p_vma_1, and has its own c_anon_vma_1 prepared; at this point, c_vma_1->anon_vma_chain has two items, one for p_anon_vma_1 and one for c_anon_vma_1; 4. c_vma_2 is dup from p_vma_2, it is not allowed to share c_anon_vma_1, because c_vma_1->anon_vma_chain has two items. [1] commit d0e9fe1758f2 ("Simplify and comment on anon_vma re-use for anon_vma_prepare()") explains the test of "list_is_singular()". Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork") Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1581150928-3214-3-git-send-email-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: don't prepare anon_vma if vma has VM_WIPEONFORKLi Xinhai
Patch series "mm: Fix misuse of parent anon_vma in dup_mmap path". This patchset fixes the misuse of parenet anon_vma, which mainly caused by child vma's vm_next and vm_prev are left same as its parent after duplicate vma. Finally, code reached parent vma's neighbor by referring pointer of child vma and executed wrong logic. The first two patches fix relevant issues, and the third patch sets vm_next and vm_prev to NULL when duplicate vma to prevent potential misuse in future. Effects of the first bug is that causes rmap code to check both parent and child's page table, although a page couldn't be mapped by both parent and child, because child vma has WIPEONFORK so all pages mapped by child are 'new' and not relevant to parent. Effects of the second bug is that the relationship of anon_vma of parent and child are totallyconvoluted. It would cause 'son', 'grandson', ..., etc, to share 'parent' anon_vma, which disobey the design rule of reusing anon_vma (the rule to be followed is that reusing should among vma of same process, and vma should not gone through fork). So, both issues should cause unnecessary rmap walking and have unexpected complexity. These two issues would not be directly visible, I used debugging code to check the anon_vma pointers of parent and child when inspecting the suspicious implementation of issue #2, then find the problem. This patch (of 3): In dup_mmap(), anon_vma_prepare() is called for vma has VM_WIPEONFORK, and parameter 'tmp' (i.e., the new vma of child) has same ->vm_next and ->vm_prev as its parent vma. That allows anon_vma used by parent been mistakenly shared by child (find_mergeable_anon_vma() will do this reuse work). Besides this issue, call anon_vma_prepare() should be avoided because we don't copy page for this vma. Preparing anon_vma will be handled during fault. Fixes: d2cd9ede6e19 ("mm,fork: introduce MADV_WIPEONFORK") Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1581150928-3214-2-git-send-email-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm, memcg: bypass high reclaim iteration for cgroup hierarchy rootChris Down
The root of the hierarchy cannot have high set, so we will never reclaim based on it. This makes that clearer and avoids another entry. Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200312164137.GA1753625@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07nvmet-rdma: fix double free of rdma queueIsrael Rukshin
In case rdma accept fails at nvmet_rdma_queue_connect(), release work is scheduled. Later on, a new RDMA CM event may arrive since we didn't destroy the cm-id and call nvmet_rdma_queue_connect_fail(), which schedule another release work. This will cause calling nvmet_rdma_free_queue twice. To fix this we implicitly destroy the cm_id with non-zero ret code, which guarantees that new rdma_cm events will not arrive afterwards. Also add a qp pointer to nvmet_rdma_queue structure, so we can use it when the cm_id pointer is NULL or was destroyed. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-07io_uring: initialize fixed_file_data lockXiaoguang Wang
syzbot reports below warning: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 assign_lock_key kernel/locking/lockdep.c:913 [inline] register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225 __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159 io_sqe_files_register fs/io_uring.c:6599 [inline] __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001 __do_sys_io_uring_register fs/io_uring.c:8081 [inline] __se_sys_io_uring_register fs/io_uring.c:8063 [inline] __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x440289 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289 RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10 R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000 Initialize struct fixed_file_data's lock to fix this issue. Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com Fixes: 055895537302 ("io_uring: refactor file register/unregister/update handling") Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-07io_uring: remove redundant variable pointer nxt and io_wq_assign_next callColin Ian King
An earlier commit "io_uring: remove @nxt from handlers" removed the setting of pointer nxt and now it is always null, hence the non-null check and call to io_wq_assign_next is redundant and can be removed. Addresses-Coverity: ("'Constant' variable guard") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-07ALSA: ice1724: Fix invalid access for enumerated ctl itemsTakashi Iwai
The access to Analog Capture Source control value implemented in prodigy_hifi.c is wrong, as caught by the recently introduced sanity check; it should be accessing value.enumerated.item[] instead of value.integer.value[]. This patch corrects the wrong access pattern. Fixes: 6b8d6e5518e2 ("[ALSA] ICE1724: Added support for Audiotrak Prodigy 7.1 HiFi & HD2, Hercules Fortissimo IV") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207139 Reviewed-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200407084402.25589-3-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-04-07ALSA: hda: Fix potential access overflow in beep helperTakashi Iwai
The beep control helper function blindly stores the values in two stereo channels no matter whether the actual control is mono or stereo. This is practically harmless, but it annoys the recently introduced sanity check, resulting in an error when the checker is enabled. This patch corrects the behavior to store only on the defined array member. Fixes: 0401e8548eac ("ALSA: hda - Move beep helper functions to hda_beep.c") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207139 Reviewed-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200407084402.25589-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-04-07ASoC: cs4270: pull reset GPIO low then highMike Willard
Pull the RST line low then high when initializing the driver, in order to force a reset of the chip. Previously, the line was not pulled low, which could result in the chip registers not resetting to their default values on boot. Signed-off-by: Mike Willard <mwillard@izotope.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200401205454.79792-1-mwillard@izotope.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-04-07ipmi: kcs: Fix aspeed_kcs_probe_of_v1()Dan Carpenter
This needs to return the newly allocated struct but instead it returns zero which leads to an immediate Oops in the caller. Fixes: 09f5f680707e ("ipmi: kcs: aspeed: Implement v2 bindings") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <20200407122149.GA100026@mwanda> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-04-07KVM: VMX: fix crash cleanup when KVM wasn't usedVitaly Kuznetsov
If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: 31603d4fc2bb ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07KVM: X86: Filter out the broadcast dest for IPI fastpathWanpeng Li
Except destination shorthand, a destination value 0xffffffff is used to broadcast interrupts, let's also filter out this for single target IPI fastpath. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1585815626-28370-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07Merge tag 'kvm-s390-master-5.7-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for vsie (nested hypervisors) - Several fixes for corner cases of nesting. Still relevant as it might crash host or first level guest or temporarily leak memory.
2020-04-07KVM: s390: vsie: Fix possible race when shadowing region 3 tablesDavid Hildenbrand
We have to properly retry again by returning -EINVAL immediately in case somebody else instantiated the table concurrently. We missed to add the goto in this function only. The code now matches the other, similar shadowing functions. We are overwriting an existing region 2 table entry. All allocated pages are added to the crst_list to be freed later, so they are not lost forever. However, when unshadowing the region 2 table, we wouldn't trigger unshadowing of the original shadowed region 3 table that we replaced. It would get unshadowed when the original region 3 table is modified. As it's not connected to the page table hierarchy anymore, it's not going to get used anymore. However, for a limited time, this page table will stick around, so it's in some sense a temporary memory leak. Identified by manual code inspection. I don't think this classifies as stable material. Fixes: 998f637cc4b9 ("s390/mm: avoid races on region/segment/page table shadowing") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-4-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-07KVM: s390: vsie: Fix delivery of addressing exceptionsDavid Hildenbrand
Whenever we get an -EFAULT, we failed to read in guest 2 physical address space. Such addressing exceptions are reported via a program intercept to the nested hypervisor. We faked the intercept, we have to return to guest 2. Instead, right now we would be returning -EFAULT from the intercept handler, eventually crashing the VM. the correct thing to do is to return 1 as rc == 1 is the internal representation of "we have to go back into g2". Addressing exceptions can only happen if the g2->g3 page tables reference invalid g2 addresses (say, either a table or the final page is not accessible - so something that basically never happens in sane environments. Identified by manual code inspection. Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-07KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checksDavid Hildenbrand
In case we have a region 1 the following calculation (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) results in 64. As shifts beyond the size are undefined the compiler is free to use instructions like sllg. sllg will only use 6 bits of the shift value (here 64) resulting in no shift at all. That means that ALL addresses will be rejected. The can result in endless loops, e.g. when prefix cannot get mapped. Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Tested-by: Janosch Frank <frankja@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-07xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()Juergen Gross
Commit 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") didn't fix the issue it was meant to, as the flags for allocating the memory are GFP_NOIO, which will lead the memory allocation falling back to kmalloc(). So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section. Fixes: 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2020-04-07xen: Use evtchn_type_t as a type for event channelsYan Yankovskyi
Make event channel functions pass event channel port using evtchn_port_t type. It eliminates signed <-> unsigned conversion. Signed-off-by: Yan Yankovskyi <yyankovskyi@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200323152343.GA28422@kbp1-lhp-F74019 Signed-off-by: Juergen Gross <jgross@suse.com>
2020-04-07virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for ↵Michael S. Tsirkin
VIRTIO_BALLOON_F_DEFLATE_ON_OOM" This reverts commit 5a6b4cc5b7a1892a8d7f63d6cbac6e0ae2a9d031. It has been queued properly in the akpm tree, this version is just creating conflicts. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-07Documentation: cpu-idle-cooling: Fix diagram for 33% duty cycleSergey Vidishev
Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/2374188.AZIXMmL6Zy@sergeyv-box
2020-04-07thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=nMartin Blumenstingl
When CONFIG_DEVFREQ_THERMAL is disabled all functions except of_devfreq_cooling_register_power() were already inlined. Also inline the last function to avoid compile errors when multiple drivers call of_devfreq_cooling_register_power() when CONFIG_DEVFREQ_THERMAL is not set. Compilation failed with the following message: multiple definition of `of_devfreq_cooling_register_power' (which then lists all usages of of_devfreq_cooling_register_power()) Thomas Zimmermann reported this problem [0] on a kernel config with CONFIG_DRM_LIMA={m,y}, CONFIG_DRM_PANFROST={m,y} and CONFIG_DEVFREQ_THERMAL=n after both, the lima and panfrost drivers gained devfreq cooling support. [0] https://www.spinics.net/lists/dri-devel/msg252825.html Fixes: a76caf55e5b356 ("thermal: Add devfreq cooling") Cc: stable@vger.kernel.org Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200403205133.1101808-1-martin.blumenstingl@googlemail.com
2020-04-07KVM: nVMX: don't clear mtf_pending when nested events are blockedOliver Upton
If nested events are blocked, don't clear the mtf_pending flag to avoid missing later delivery of the MTF VM-exit. Fixes: 5ef8acbdd687c ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200406201237.178725-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenterUros Bizjak
The exception trampoline in .fixup section is not needed, the exception handling code can jump directly to the label in the .text section. Changes since v1: - Fix commit message. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200406202108.74300-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07ALSA: hda/realtek - Add HP new mute led supported for ALC236Kailang Yang
HP new platform has new mute led feature. COEF index 0x34 bit 5 to control playback mute led. COEF index 0x35 bit 2 and bit 3 to control Mic mute led. [ corrected typos by tiwai ] Signed-off-by: Kailang Yang <kailang@realtek.com> Link: https://lore.kernel.org/r/6741211598ba499687362ff2aa30626b@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-04-07ALSA: hda/realtek - Add supported new mute Led for HPKailang Yang
HP Note Book supported new mute Led. Hardware PIN was not enough to meet old LED rule. JD2 to control playback mute led. GPO3 to control capture mute led. (ALC285 didn't control GPO3 via verb command) This two PIN just could control by COEF registers. [ corrected typos by tiwai ] Signed-off-by: Kailang Yang <kailang@realtek.com> Link: https://lore.kernel.org/r/6741211598ba499687362ff2aa30626b@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-04-07drm/nouveau/kms/nv50-: wait for FIFO space on PIO channelsBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-04-07drm/nouveau/nvif: protect waits against GPU falling off the busBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-04-07drm/nouveau/nvif: access PTIMER through usermode class, if availableBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-04-07drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during initBen Skeggs
Certain boards with GP107/GP108 chipsets hang (often, but randomly) for unknown reasons during GR initialisation. The first tell-tale symptom of this issue is: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ] appearing in dmesg, likely followed by many other failures being logged. Karol found this WAR for the issue a while back, but efforts to isolate the root cause and proper fix have not yielded success so far. I've modified the original patch to include a few more details, limit it to GP107/GP108 by default, and added a config option to override this choice. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>