summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-05mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap()Liviu Dudau
find_vmap_area() can return a NULL pointer and we're going to dereference it without checking it first. Use the existing find_vm_area() function which does exactly what we want and checks for the NULL pointer. Link: http://lkml.kernel.org/r/20181228171009.22269-1-liviu@dudau.co.uk Fixes: f3c01d2f3ade ("mm: vmalloc: avoid racy handling of debugobjects in vunmap") Signed-off-by: Liviu Dudau <liviu@dudau.co.uk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05PM/Hibernate: exclude all PageOffline() pagesDavid Hildenbrand
The content of pages that are marked PG_offline is not of interest (e.g. inflated by a balloon driver), let's skip these pages. In saveable_highmem_page(), move the PageReserved() check to a new check along with the PageOffline() check to separate it from the swsusp checks. [david@redhat.com: v2] Link: http://lkml.kernel.org/r/20181122100627.5189-9-david@redhat.com Link: http://lkml.kernel.org/r/20181119101616.8901-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05PM/Hibernate: use pfn_to_online_page()David Hildenbrand
Let's use pfn_to_online_page() instead of pfn_to_page() when checking for saveable pages to not save/restore offline memory sections. Link: http://lkml.kernel.org/r/20181119101616.8901-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05vmw_balloon: mark inflated pages PG_offlineDavid Hildenbrand
Mark inflated and never onlined pages PG_offline, to tell the world that the content is stale and should not be dumped. [david@redhat.com: use vmballoon_page_in_frames more widely] Link: http://lkml.kernel.org/r/20181122100627.5189-7-david@redhat.com Link: http://lkml.kernel.org/r/20181119101616.8901-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Nadav Amit <namit@vmware.com> Cc: Xavier Deguillard <xdeguillard@vmware.com> Cc: Nadav Amit <namit@vmware.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julien Freche <jfreche@vmware.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <len.brown@intel.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05hv_balloon: mark inflated pages PG_offlineDavid Hildenbrand
Mark inflated and never onlined pages PG_offline, to tell the world that the content is stale and should not be dumped. Link: http://lkml.kernel.org/r/20181119101616.8901-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Pankaj gupta <pagupta@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Kairui Song <kasong@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05xen/balloon: mark inflated pages PG_offlineDavid Hildenbrand
Mark inflated and never onlined pages PG_offline, to tell the world that the content is stale and should not be dumped. Link: http://lkml.kernel.org/r/20181119101616.8901-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <len.brown@intel.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05kexec: export PG_offline to VMCOREINFODavid Hildenbrand
Right now, pages inflated as part of a balloon driver will be dumped by dump tools like makedumpfile. While XEN is able to check in the crash kernel whether a certain pfn is actuall backed by memory in the hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of other balloon inflated memory will essentially result in zero pages getting allocated by the hypervisor and the dump getting filled with this data. The allocation and reading of zero pages can directly be avoided if a dumping tool could know which pages only contain stale information not to be dumped. We now have PG_offline which can be (and already is by virtio-balloon) used for marking pages as logically offline. Follow up patches will make use of this flag also in other balloon implementations. Let's export PG_offline via PAGE_OFFLINE_MAPCOUNT_VALUE, so makedumpfile can directly skip pages that are logically offline and the content therefore stale. Please note that this is also helpful for a problem we were seeing under Hyper-V: Dumping logically offline memory (pages kept fake offline while onlining a section via online_page_callback) would under some condicions result in a kernel panic when dumping them. Link: http://lkml.kernel.org/r/20181119101616.8901-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <len.brown@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: convert PG_balloon to PG_offlineDavid Hildenbrand
PG_balloon was introduced to implement page migration/compaction for pages inflated in virtio-balloon. Nowadays, it is only a marker that a page is part of virtio-balloon and therefore logically offline. We also want to make use of this flag in other balloon drivers - for inflated pages or when onlining a section but keeping some pages offline (e.g. used right now by XEN and Hyper-V via set_online_page_callback()). We are going to expose this flag to dump tools like makedumpfile. But instead of exposing PG_balloon, let's generalize the concept of marking pages as logically offline, so it can be reused for other purposes later on. Rename PG_balloon to PG_offline. This is an indicator that the page is logically offline, the content stale and that it should not be touched (e.g. a hypervisor would have to allocate backing storage in order for the guest to dump an unused page). We can then e.g. exclude such pages from dumps. We replace and reuse KPF_BALLOON (23), as this shouldn't really harm (and for now the semantics stay the same). In following patches, we will make use of this bit also in other balloon drivers. While at it, document PGTABLE. [akpm@linux-foundation.org: fix comment text, per David] Link: http://lkml.kernel.org/r/20181119101616.8901-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Pankaj gupta <pagupta@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <len.brown@intel.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: balloon: update comment about isolation/migration/compactionDavid Hildenbrand
Patch series "mm/kdump: allow to exclude pages that are logically offline" Right now, pages inflated as part of a balloon driver will be dumped by dump tools like makedumpfile. While XEN is able to check in the crash kernel whether a certain pfn is actuall backed by memory in the hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of virtio-balloon, hv-balloon and VMWare balloon inflated memory will essentially result in zero pages getting allocated by the hypervisor and the dump getting filled with this data. The allocation and reading of zero pages can directly be avoided if a dumping tool could know which pages only contain stale information not to be dumped. Also for XEN, calling into the kernel and asking the hypervisor if a pfn is backed can be avoided if the duming tool would skip such pages right from the beginning. Dumping tools have no idea whether a given page is part of a balloon driver and shall not be dumped. Esp. PG_reserved cannot be used for that purpose as all memory allocated during early boot is also PG_reserved, see discussion at [1]. So some other way of indication is required and a new page flag is frowned upon. We have PG_balloon (MAPCOUNT value), which is essentially unused now. I suggest renaming it to something more generic (PG_offline) to mark pages as logically offline. This flag can than e.g. also be used by virtio-mem in the future to mark subsections as offline. Or by other code that wants to put pages logically offline (e.g. later maybe poisoned pages that shall no longer be used). This series converts PG_balloon to PG_offline, allows dumping tools to query the value to detect such pages and marks pages in the hv-balloon and XEN balloon properly as PG_offline. Note that virtio-balloon already set pages to PG_balloon (and now PG_offline). Please note that this is also helpful for a problem we were seeing under Hyper-V: Dumping logically offline memory (pages kept fake offline while onlining a section via online_page_callback) would under some condicions result in a kernel panic when dumping them. As I don't have access to neither XEN nor Hyper-V nor VMWare installations, this was only tested with the virtio-balloon and pages were properly skipped when dumping. I'll also attach the makedumpfile patch to this series. [1] https://lkml.org/lkml/2018/7/20/566 This patch (of 8): Commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") reworked balloon handling to make use of the general non-lru movable page feature. The big comment block in balloon_compaction.h contains quite some outdated information. Let's fix this. Link: http://lkml.kernel.org/r/20181119101616.8901-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <len.brown@intel.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/page_alloc.c: memory hotplug: free pages as higher orderArun KS
When freeing pages are done with higher order, time spent on coalescing pages by buddy allocator can be reduced. With section size of 256MB, hot add latency of a single section shows improvement from 50-60 ms to less than 1 ms, hence improving the hot add latency by 60 times. Modify external providers of online callback to align with the change. [arunks@codeaurora.org: v11] Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org [akpm@linux-foundation.org: remove unused local, per Arun] [akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar] [akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch] [arunks@codeaurora.org: v8] Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org [arunks@codeaurora.org: v9] Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Srivatsa Vaddagiri <vatsa@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/slub.c: remove an unused addr argumentQian Cai
"addr" function argument is not used in alloc_consistency_checks() at all, so remove it. Link: http://lkml.kernel.org/r/20190211123214.35592-1-cai@lca.pw Fixes: becfda68abca ("slub: convert SLAB_DEBUG_FREE to SLAB_CONSISTENCY_CHECKS") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05include/linux/slub_def.h: comment fixesTobin C. Harding
Capitialize comment string, use C89 comment style, correct grammar/punctuation in comments. Link: http://lkml.kernel.org/r/20190204005713.9463-2-tobin@kernel.org Link: http://lkml.kernel.org/r/20190204005713.9463-3-tobin@kernel.org Link: http://lkml.kernel.org/r/20190204005713.9463-4-tobin@kernel.org Signed-off-by: Tobin C. Harding <tobin@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/slab.c: kmemleak no scan alien cachesQian Cai
Kmemleak throws endless warnings during boot due to in __alloc_alien_cache(), alc = kmalloc_node(memsize, gfp, node); init_arraycache(&alc->ac, entries, batch); kmemleak_no_scan(ac); Kmemleak does not track the array cache (alc->ac) but the alien cache (alc) instead, so let it track the latter by lifting kmemleak_no_scan() out of init_arraycache(). There is another place that calls init_arraycache(), but alloc_kmem_cache_cpus() uses the percpu allocation where will never be considered as a leak. kmemleak: Found object by alias at 0xffff8007b9aa7e38 CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2 Call trace: dump_backtrace+0x0/0x168 show_stack+0x24/0x30 dump_stack+0x88/0xb0 lookup_object+0x84/0xac find_and_get_object+0x84/0xe4 kmemleak_no_scan+0x74/0xf4 setup_kmem_cache_node+0x2b4/0x35c __do_tune_cpucache+0x250/0x2d4 do_tune_cpucache+0x4c/0xe4 enable_cpucache+0xc8/0x110 setup_cpu_cache+0x40/0x1b8 __kmem_cache_create+0x240/0x358 create_cache+0xc0/0x198 kmem_cache_create_usercopy+0x158/0x20c kmem_cache_create+0x50/0x64 fsnotify_init+0x58/0x6c do_one_initcall+0x194/0x388 kernel_init_freeable+0x668/0x688 kernel_init+0x18/0x124 ret_from_fork+0x10/0x18 kmemleak: Object 0xffff8007b9aa7e00 (size 256): kmemleak: comm "swapper/0", pid 1, jiffies 4294697137 kmemleak: min_count = 1 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: kmemleak_alloc+0x84/0xb8 kmem_cache_alloc_node_trace+0x31c/0x3a0 __kmalloc_node+0x58/0x78 setup_kmem_cache_node+0x26c/0x35c __do_tune_cpucache+0x250/0x2d4 do_tune_cpucache+0x4c/0xe4 enable_cpucache+0xc8/0x110 setup_cpu_cache+0x40/0x1b8 __kmem_cache_create+0x240/0x358 create_cache+0xc0/0x198 kmem_cache_create_usercopy+0x158/0x20c kmem_cache_create+0x50/0x64 fsnotify_init+0x58/0x6c do_one_initcall+0x194/0x388 kernel_init_freeable+0x668/0x688 kernel_init+0x18/0x124 kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38 CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2 Call trace: dump_backtrace+0x0/0x168 show_stack+0x24/0x30 dump_stack+0x88/0xb0 kmemleak_no_scan+0x90/0xf4 setup_kmem_cache_node+0x2b4/0x35c __do_tune_cpucache+0x250/0x2d4 do_tune_cpucache+0x4c/0xe4 enable_cpucache+0xc8/0x110 setup_cpu_cache+0x40/0x1b8 __kmem_cache_create+0x240/0x358 create_cache+0xc0/0x198 kmem_cache_create_usercopy+0x158/0x20c kmem_cache_create+0x50/0x64 fsnotify_init+0x58/0x6c do_one_initcall+0x194/0x388 kernel_init_freeable+0x668/0x688 kernel_init+0x18/0x124 ret_from_fork+0x10/0x18 Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw Fixes: 1fe00d50a9e8 ("slab: factor out initialization of array cache") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/slub.c: freelist is ensured to be NULL when new_slab() failsPeng Wang
new_slab_objects() will return immediately if freelist is not NULL. if (freelist) return freelist; One more assignment operation could be avoided. Link: http://lkml.kernel.org/r/20181229062512.30469-1-rocking@whu.edu.cn Signed-off-by: Peng Wang <rocking@whu.edu.cn> Reviewed-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05fs/file.c: initialize init_files.resize_waitShuriyc Chu
(Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647) 'get_unused_fd_flags' in kthread cause kernel crash. It works fine on 4.1, but causes crash after get 64 fds. It also cause crash on ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the same. The crash message on centos7.5 shows below: start fd 61 start fd 62 start fd 63 BUG: unable to handle kernel NULL pointer dereference at (null) IP: __wake_up_common+0x2e/0x90 PGD 0 Oops: 0000 [#1] SMP Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000 RIP: 0010:__wake_up_common+0x2e/0x90 RSP: 0018:ffff8e94247a2d18 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0 RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0 Call Trace: __wake_up+0x39/0x50 expand_files+0x131/0x250 __alloc_fd+0x47/0x170 get_unused_fd_flags+0x30/0x40 test_fd+0x12a/0x1c0 [test] kthread+0xd1/0xe0 ret_from_fork_nospec_begin+0x21/0x21 Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 <48> 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef RIP __wake_up_common+0x2e/0x90 RSP <ffff8e94247a2d18> CR2: 0000000000000000 This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4 (3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not initialized before being used. Reported-by: Richard Zhang <zhang.zijian@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05fs/inode.c: inode_set_flags(): replace opencoded set_mask_bits()Vineet Gupta
It seems that commits 5f16f3225b0624 and 00a1a053ebe5, both with same commitlog ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()") introduced the set_mask_bits API, but somehow missed not using it in ext4 in the end. Also, set_mask_bits() is used in fs quite a bit and we can possibly come up with a generic llsc based implementation (w/o the cmpxchg loop) Link: http://lkml.kernel.org/r/1548275584-18096-3-git-send-email-vgupta@synopsys.com Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05ocfs2: Use zero-sized array and struct_size() in kzalloc()Gustavo A. R. Silva
Update the code to use a zero-sized array instead of a pointer in structure ocfs2_slot_info and use struct_size() in kzalloc(). Notice that one of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20190108191903.GA22056@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05ocfs2: fix the application IO timeout when fstrim is runningGang He
The user reported this problem, the upper application IO was timeout when fstrim was running on this ocfs2 partition. the application monitoring resource agent considered that this application did not work, then this node was fenced by the cluster brain (e.g. pacemaker). The root cause is that fstrim thread always holds main_bm meta-file related locks until all the cluster groups are trimmed. This patch will make fstrim thread release main_bm meta-file related locks when each cluster group is trimmed, this will let the current application IO has a chance to claim the clusters from main_bm meta-file. Link: http://lkml.kernel.org/r/20190111090014.31645-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05ocfs2: fix a panic problem caused by o2cb_ctlJia Guo
In the process of creating a node, it will cause NULL pointer dereference in kernel if o2cb_ctl failed in the interval (mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node. The node num is initialized to 0 in function o2nm_node_group_make_item, o2nm_node_group_drop_item will mistake the node number 0 for a valid node number when we delete the node before the node number is set correctly. If the local node number of the current host happens to be 0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while o2hb_thread still running. The panic stack is generated as follows: o2hb_thread \-o2hb_do_disk_heartbeat \-o2hb_check_own_slot |-slot = &reg->hr_slots[o2nm_this_node()]; //o2nm_this_node() return O2NM_INVALID_NODE_NUM We need to check whether the node number is set when we delete the node. Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.com Signed-off-by: Jia Guo <guojia12@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05sh: remove nargs from __SYSCALLFiroz Khan
The __SYSCALL macro's arguments are system call number, system call entry name and number of arguments for the system call. Argument- nargs in __SYSCALL(nr, entry, nargs) is neither calculated nor used anywhere. So it would be better to keep the implementation as __SYSCALL(nr, entry). This unifies the implementation with some other architectures too. Link: http://lkml.kernel.org/r/1546443445-21075-2-git-send-email-firoz.khan@linaro.org Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05scripts/decode_stacktrace.sh: handle RIP address with segmentKonstantin Khlebnikov
decode line: RIP: 0010:khugepaged+0x2a2/0x2280 into RIP: 0010:khugepaged (mm/khugepaged.c:1885) Link: http://lkml.kernel.org/r/154660071227.52726.15645307951282727605.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05kasan: fix coccinelle warnings in kasan_p*_tableAndrey Konovalov
kasan_p4d_table(), kasan_pmd_table() and kasan_pud_table() are declared as returning bool, but return 0 instead of false, which produces a coccinelle warning. Fix it. Link: http://lkml.kernel.org/r/1fa6fadf644859e8a6a8ecce258444b49be8c7ee.1551716733.git.andreyknvl@google.com Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05kasan: fix kasan_check_read/write definitionsArnd Bergmann
Building little-endian allmodconfig kernels on arm64 started failing with the generated atomic.h implementation, since we now try to call kasan helpers from the EFI stub: aarch64-linux-gnu-ld: drivers/firmware/efi/libstub/arm-stub.stub.o: in function `atomic_set': include/generated/atomic-instrumented.h:44: undefined reference to `__efistub_kasan_check_write' I suspect that we get similar problems in other files that explicitly disable KASAN for some reason but call atomic_t based helper functions. We can fix this by checking the predefined __SANITIZE_ADDRESS__ macro that the compiler sets instead of checking CONFIG_KASAN, but this in turn requires a small hack in mm/kasan/common.c so we do see the extern declaration there instead of the inline function. Link: http://lkml.kernel.org/r/20181211133453.2835077-1-arnd@arndb.de Fixes: b1864b828644 ("locking/atomics: build atomic headers as required") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05page_poison: play nicely with KASANQian Cai
KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING). It triggers false positives in the allocation path: BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330 Read of size 8 at addr ffff88881f800000 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54 Call Trace: dump_stack+0xe0/0x19a print_address_description.cold.2+0x9/0x28b kasan_report.cold.3+0x7a/0xb5 __asan_report_load8_noabort+0x19/0x20 memchr_inv+0x2ea/0x330 kernel_poison_pages+0x103/0x3d5 get_page_from_freelist+0x15e7/0x4d90 because KASAN has not yet unpoisoned the shadow page for allocation before it checks memchr_inv() but only found a stale poison pattern. Also, false positives in free path, BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5 Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1 CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55 Call Trace: dump_stack+0xe0/0x19a print_address_description.cold.2+0x9/0x28b kasan_report.cold.3+0x7a/0xb5 check_memory_region+0x22d/0x250 memset+0x28/0x40 kernel_poison_pages+0x29e/0x3d5 __free_pages_ok+0x75f/0x13e0 due to KASAN adds poisoned redzones around slab objects, but the page poisoning needs to poison the whole page. Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05kasan: remove use after scope bugs detection.Andrey Ryabinin
Use after scope bugs detector seems to be almost entirely useless for the linux kernel. It exists over two years, but I've seen only one valid bug so far [1]. And the bug was fixed before it has been reported. There were some other use-after-scope reports, but they were false-positives due to different reasons like incompatibility with structleak plugin. This feature significantly increases stack usage, especially with GCC < 9 version, and causes a 32K stack overflow. It probably adds performance penalty too. Given all that, let's remove use-after-scope detector entirely. While preparing this patch I've noticed that we mistakenly enable use-after-scope detection for clang compiler regardless of CONFIG_KASAN_EXTRA setting. This is also fixed now. [1] http://lkml.kernel.org/r/<20171129052106.rhgbjhhis53hkgfn@wfg-t540p.sh.intel.com> Link: http://lkml.kernel.org/r/20190111185842.13978-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Will Deacon <will.deacon@arm.com> [arm64] Cc: Qian Cai <cai@lca.pw> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: hwpoison: fix thp split handing in soft_offline_in_use_page()zhongjiang
When soft_offline_in_use_page() runs on a thp tail page after pmd is split, we trigger the following VM_BUG_ON_PAGE(): Memory failure: 0x3755ff: non anonymous thp __get_any_page: 0x3755ff: unknown zero refcount page type 2fffff80000000 Soft offlining pfn 0x34d805 at process virtual address 0x20fff000 page:ffffea000d360140 count:0 mapcount:0 mapping:0000000000000000 index:0x1 flags: 0x2fffff80000000() raw: 002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:519! soft_offline_in_use_page() passed refcount and page lock from tail page to head page, which is not needed because we can pass any subpage to split_huge_page(). Naoya had fixed a similar issue in c3901e722b29 ("mm: hwpoison: fix thp split handling in memory_failure()"). But he missed fixing soft offline. Link: http://lkml.kernel.org/r/1551452476-24000-1-git-send-email-zhongjiang@huawei.com Fixes: 61f5d698cc97 ("mm: re-enable THP") Signed-off-by: zhongjiang <zhongjiang@huawei.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05smb3: request more credits on normal (non-large read/write) opsSteve French
We can end up building up credits too slowly to do large operations (reads and writes for example) that require many credits. By comparison most other SMB3 clients request many more (sometimes thousands) of credits on all operations. Increase the number of credits we request on typical (non-large e.g read/write) operations to 10 from 2 so we can build a pool of credits faster. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-03-05CIFS: Mask off signals when sending SMB packetsPavel Shilovsky
We don't want to break SMB sessions if we receive signals when sending packets through the network. Fix it by masking off signals inside __smb_send_rqst() to avoid partial packet sends due to interrupts. Return -EINTR if a signal is pending and only a part of the packet was sent. Return a success status code if the whole packet was sent regardless of signal being pending or not. This keeps a mid entry for the request in the pending queue and allows the demultiplex thread to handle a response from the server properly. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Return -EAGAIN instead of -ENOTSOCKPavel Shilovsky
When we attempt to send a packet while the demultiplex thread is in the middle of cifs_reconnect() we may end up returning -ENOTSOCK to upper layers. The intent here is to retry the request once the TCP connection is up, so change it to return -EAGAIN instead. The latter error code is retryable and the upper layers will retry the request if needed. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Only send SMB2_NEGOTIATE command on new TCP connectionsPavel Shilovsky
Do not allow commands other than SMB2_NEGOTIATE to be sent over recently established TCP connections. Return -EAGAIN to let upper layers handle it properly. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Fix read after write for files with read cachingPavel Shilovsky
When we have a READ lease for a file and have just issued a write operation to the server we need to purge the cache and set oplock/lease level to NONE to avoid reading stale data. Currently we do that only if a write operation succedeed thus not covering cases when a request was sent to the server but a negative error code was returned later for some other reasons (e.g. -EIOCBQUEUED or -EINTR). Fix this by turning off caching regardless of the error code being returned. The patches fixes generic tests 075 and 112 from the xfs-tests. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-03-05smb3: for kerberos mounts display the credential uid usedSteve French
For kerberos mounts, the cruid is helpful to display in /proc/mounts in order to tell which uid's krb5 cache we got the ticket for and to tell in the multiuser krb5 case which local users (uids) we have Kerberos authentic sessions for. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-03-05cifs: use correct format charactersLouis Taylor
When compiling with -Wformat, clang emits the following warnings: fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] tgt_total_cnt, total_in_tgt); ^~~~~~~~~~~~ fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short' but the argument has type 'int' [-Wformat] ref->flags, ref->server_type); ^~~~~~~~~~ fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short' but the argument has type 'int' [-Wformat] ref->flags, ref->server_type); ^~~~~~~~~~~~~~~~ fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short' but the argument has type 'int' [-Wformat] ref->ref_flag, ref->path_consumed); ^~~~~~~~~~~~~ fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short' but the argument has type 'int' [-Wformat] ref->ref_flag, ref->path_consumed); ^~~~~~~~~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Louis Taylor <louis@kragniz.eu> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2019-03-05smb3: add dynamic trace point for query_info_enter/doneSteve French
Adds dynamic trace points for the query_info_enter and query_info_done (no error) case. We only had one existing trace point related to this which was on query_info errors. Note that these two new tracepoints are for the non-compounded query_info paths. Sample output (from: trace-cmd record -e smb3_query_info*) ls-24140 [001] .... 27811.866068: smb3_query_info_enter: xid=7 sid=0xd2d00587 tid=0xb5441939 fid=0xcf082bac class=18 type=0x1 ls-24140 [001] .... 27811.867656: smb3_query_info_done: xid=7 sid=0xd2d00587 tid=0xb5441939 fid=0xcf082bac class=18 type=0x1 getcifsacl-24149 [005] .... 27854.759873: smb3_query_info_enter: xid=15 sid=0xd2d00587 tid=0xb5441939 fid=0x99896e72 class=0 type=0x3 getcifsacl-24149 [005] .... 27854.761730: smb3_query_info_done: xid=15 sid=0xd2d00587 tid=0xb5441939 fid=0x99896e72 class=0 type=0x3 Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-03-05smb3: add dynamic trace point for smb3_cmd_enterSteve French
Add tracepoint before sending an SMB3 command on the wire (ie add an smb3_cmd_enter tracepoint). This allows us to look in much more detail at response times (between request and response). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-03-05smb3: improve dynamic tracing of open and posix mkdirSteve French
Add dynamic trace point for open_enter (and posix mkdir enter) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-03-05smb3: add missing read completion trace pointSteve French
When ENODATA returned we weren't logging the read completion (not an error, but can be indicated by logging length 0) which makes looking at read traces confusing for smb3. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-03-05smb3: Add tracepoints for read, write and query_dir enterSteve French
Allows tracing begin (not just completion) of read, write and query_dir which may be helpful in finding slow requests and other timing information Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-03-05smb3: add tracepoints for query dirSteve French
Adds two tracepoints - one for query_dir done (no err) and one for query_dir_err Sanple output: To start the trace in one window: trace-cmd record -e smb3_query_dir* Then in another window after doing an ls /mnt View the trace output by: trace-cmd show Sample output: TASK-PID CPU# |||| TIMESTAMP FUNCTION | | | |||| | | ls-24869 [007] .... 90695.452009: smb3_query_dir_done: xid=7 sid=0x5027d24d tid=0xb95cf25a fid=0xc41a8c3e offset=0x0 len=0x16 ls-24869 [000] .... 90695.452764: smb3_query_dir_done: xid=8 sid=0x5027d24d tid=0xb95cf25a fid=0xc41a8c3e offset=0x0 len=0x0 ls-24874 [003] .... 90701.506342: smb3_query_dir_done: xid=11 sid=0x5027d24d tid=0xb95cf25a fid=0x33ad3601 offset=0x0 len=0x8 ls-24874 [003] .... 90701.506917: smb3_query_dir_done: xid=12 sid=0x5027d24d tid=0xb95cf25a fid=0x33ad3601 offset=0x0 len=0x0 Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05smb3: Update POSIX negotiate context with POSIX ctxt GUIDSteve French
POSIX negotiate context now includes the GUID specifying which POSIX open context we support. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Jeremy Allison <jra@samba.org>
2019-03-05cifs: update internal module version numberSteve French
To 2.18 Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Try to acquire credits at once for compound requestsPavel Shilovsky
Currently we get one credit per compound part of the request individually. This may lead to being stuck on waiting for credits if multiple compounded operations happen in parallel. Try acquire credits for all compound parts at once. Return immediately if not enough credits and too few requests are in flight currently thus narrowing the possibility of infinite waiting for credits. The more advance fix is to return right away if not enough credits for the compound request and do not look at the number of requests in flight. The caller should handle such situations by falling back to sequential execution of SMB commands instead of compounding. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Return error code when getting file handle for writebackPavel Shilovsky
Now we just return NULL cifsFileInfo pointer in cases we didn't find or couldn't reopen a file. This hides errors from cifs_reopen_file() especially retryable errors which should be handled appropriately. Create new cifs_get_writable_file() routine that returns error codes from cifs_reopen_file() and use it in the writeback codepath. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Move open file handling to writepagesPavel Shilovsky
Currently we check for an open file existence in wdata_send_pages() which doesn't provide an easy way to handle error codes that will be returned from find_writable_filehandle() once it is changed. Move the check to writepages. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Move unlocking pages from wdata_send_pages()Pavel Shilovsky
Currently wdata_send_pages() unlocks pages after sending. This complicates further refactoring and doesn't align with the function name. Move unlocking to writepages. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Find and reopen a file before get MTU credits in writepagesPavel Shilovsky
Reorder finding and reopening a writable handle file and getting MTU credits in writepages because we may be stuck on low credits otherwise. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Reopen file before get SMB2 MTU credits for async IOPavel Shilovsky
Currently we get MTU credits before we check an open file if it needs to be reopened. Reopening the file in such conditions leads to a possibility of being stuck waiting indefinitely for credits in the transport layer. Fix this by reopening the file first if needed and then getting MTU credits for async IO. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Remove custom credit adjustments for SMB2 async IOPavel Shilovsky
Currently we do proper accounting for credits in regards to reconnects and error handling, thus we do not need custom credit adjustments when reconnect is detected developed previously. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Adjust MTU credits before reopening a filePavel Shilovsky
Currently we adjust MTU credits before sending an IO request and after reopening a file. This approach doesn't allow the reopen routine to use existing credits that are not needed for IO. Reorder credit adjustment and reopening a file to use credits available to the client more efficiently. Also unwrap complex if statement into few pieces to improve readability. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-05CIFS: Check for reconnects before sending compound requestsPavel Shilovsky
The reconnect might have happended after we obtained credits and before we acquired srv_mutex. Check for that under the mutex and retry a sync operation if the reconnect is detected. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>