summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2019-07-19x86/hyper-v: Zero out the VP ASSIST PAGE on allocationDexuan Cui
The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section 5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt: "The hypervisor defines several special pages that "overlay" the guest's Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are not included in the normal GPA map maintained internally by the hypervisor. Conceptually, they exist in a separate map that overlays the GPA map. If a page within the GPA space is overlaid, any SPA page mapped to the GPA page is effectively "obscured" and generally unreachable by the virtual processor through processor memory accesses. If an overlay page is disabled, the underlying GPA page is "uncovered", and an existing mapping becomes accessible to the guest." SPA = System Physical Address = the final real physical address. When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST PAGE and enables the EOI optimization for this CPU by writing the MSR HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the special SPA page, and this CPU *always* uses hvp->apic_assist (which is shared with the hypervisor) to decide if it needs to write the EOI MSR. When a CPU is offlined then on the outgoing CPU: 1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from now on hvp->apic_assist belongs to the original "normal" SPA page; 2. the remaining work of stopping this CPU is done 3. this CPU is completely stopped. Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write the EOI MSR for every interrupt received, otherwise the hypervisor may not deliver further interrupts, which may be needed to completely stop the CPU. So, after the EOI optimization is disabled in hv_cpu_die(), it's required that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the current allocation mode because it lacks __GFP_ZERO. As a consequence the bit might be set and interrupt handling would not write the EOI MSR causing interrupt delivery to become stuck. Add the missing __GFP_ZERO to the allocation. Note 1: after the "normal" SPA page is allocted and zeroed out, neither the hypervisor nor the guest writes into the page, so the page remains with zeros. Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI optimization. When the optimization is enabled, the guest can still write the EOI MSR register irrespective of the "No EOI required" value, but that's slower than the optimized assist based variant. Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
2019-07-19csky: Fixup abiv1 memset errorGuo Ren
Current memset implementation in abiv1 is wrong and it'll cause unalign access. Just remove it and use the generic one. This patch will cause performance degradation and we will improve it with a new design in next patchset. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
2019-07-19csky: Improve tlb operation with help of asidGuo Ren
There are two generations of tlb operation instruction for C-SKY. First generation is use mcr register and it need software do more things, second generation is use specific instructions, eg: tlbi.va, tlbi.vas, tlbi.alls We implemented the following functions: - flush_tlb_range (a range of entries) - flush_tlb_page (one entry) Above functions use asid from vma->mm to invalid tlb entries and we could use tlbi.vas instruction for newest generation csky cpu. - flush_tlb_kernel_range - flush_tlb_one Above functions don't care asid and it invalid the tlb entries only with vpn and we could use tlbi.vaas instruction for newest generat- ion csky cpu. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
2019-07-19csky: Use generic asid algorithm to implement switch_mmGuo Ren
Use linux generic asid/vmid algorithm to implement csky switch_mm function. The algorithm is from arm and it could work with SMP system. It'll help reduce tlb flush for switch_mm in task/vm switch. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
2019-07-19csky: Add new asid lib code from armGuo Ren
This patch only contains asid help code from arm for next patch to use. The asid allocator use five level check to reduce the cost of switch_mm. 1. Check if the asid version is the same (it's general) 2. Check reserved_asid which is set in rollover flush_context() and key point is to keep the same bit position with the current asid version instead of input version. 3. Check if the position of bitmap is free then it could be set & used directly. 4. find_next_zero_bit() (a little performance cost) 5. flush_context (this is the worst cost with increase current asid version) Check is level by level and cost is also higher with the next level. The reserved_asid and bitmap mechanism prevent unnecessary find_next_zero_bit(). The atomic 64 bit asid is also suitable for 32-bit system and it won't cost a lot in 1th 2th 3th level check. The operation of set/clear mm_cpumask was removed in arm64 compared to arm32. It seems no side effect on current arm64 system, but from software meaning it's wrong. Although csky also needn't it, we add it back for csky. The asid_per_ctxt is no use for csky and it reserves the lowest bits for other use, maybe: trust zone ? Ok, just keep it in csky copy. Seems it also could be used by other archs and it's worth to move asid code to generic in future. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Julien Grall <julien.grall@arm.com>
2019-07-19csky: Revert mmu ASID mechanismGuo Ren
Current C-SKY ASID mechanism is from mips and it doesn't work well with multi-cores. ASID per core mechanism is not suitable for C-SKY SMP tlb maintain operations, eg: tlbi.vas need share the same asid in all processors and it'll invalid the tlb entry in all cores with the same asid. This patch is prepare for new ASID mechanism. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
2019-07-19csky: Fixup some error count in 810 & 860.Guo Ren
CK810 pmu only support event with index 0-8 and 0xd; CK860 only support event 1~4, 0xa~0x1b. So do not register unsupport event to hardware cache event, which may leader to unknown behavior. Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <ren_guo@c-sky.com>
2019-07-19csky: Fix perf record in kernel/user spaceMao Han
csky_pmu_event_init is called several times during the perf record initialzation. After configure the event counter in either kernel space or user space, csky_pmu_event_init is called twice with no attr specified. Configuration will be overwritten with sampling in both kernel space and user space. --all-kernel/--all-user is useless without this patch applied. Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-07-19csky: Add pmu interrupt supportMao Han
This patch add interrupt request and handler for csky pmu. perf can record on hardware event with this patch applied. Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-07-19csky: Add count-width property for csky pmuMao Han
The csky pmu counter may have different io width. When the counter is smaller then 64 bits and counter value is smaller than the old value, it will result to a extremely large delta value. So the sampled value should be extend to 64 bits to avoid this, the extension bits base on the count-width property from dts. Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-07-19csky: Init pmu as a deviceMao Han
This patch change the csky pmu initialization from arch init to device init. The pmu can be configued with information from device tree(pmu device name, irq number and etc.). Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-07-19csky: Fixup no panic in kernel for some trapsGuo Ren
These traps couldn't be hanppen in kernel and we must panic there not send a signal to userspace. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
2019-07-19csky: Select intc & timer driversGuo Ren
Let arch help to select interrupt controller's and timer's drivers instead of people using menuconfig to select. This help the mini system boot up. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
2019-07-18riscv: enable sys_clone3 syscall for rv64Paul Walmsley
Enable the sys_clone3 syscall for RV64. We simply include the generic version. Tested by running the program from https://lore.kernel.org/lkml/20190716130631.tohj4ub54md25dys@brauner.io/ and verifying that it completes successfully. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Acked-by: Christian Brauner <christian@brauner.io> Cc: Christian Brauner <christian@brauner.io>
2019-07-19KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create failsCédric Le Goater
The XIVE device structure is now allocated in kvmppc_xive_get_device() and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create() will result in a double free and corrupt the host memory. Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6ea6998b-a890-2511-01d1-747d7621eb19@kaod.org
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()Dan Williams
Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section usage in current deployments. New warnings are added to clarify which memmap allocation paths are sub-section capable. Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of ↵David Hildenbrand
pfns walk_memory_range() was once used to iterate over sections. Now, it iterates over memory blocks. Rename the function, fixup the documentation. Also, pass start+size instead of PFNs, which is what most callers already have at hand. (we'll rework link_mem_sections() most probably soon) Follow-up patches will rework, simplify, and move walk_memory_blocks() to drivers/base/memory.c. Note: walk_memory_blocks() only works correctly right now if the start_pfn is aligned to a section start. This is the case right now, but we'll generalize the function in a follow up patch so the semantics match the documentation. [akpm@linux-foundation.org: remove unused variable] Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Arun KS <arunks@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVEDavid Hildenbrand
We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Rob Herring <robh@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18arm64/mm: add temporary arch_remove_memory() implementationDavid Hildenbrand
A proper arch_remove_memory() implementation is on its way, which also cleanly removes page tables in arch_add_memory() in case something goes wrong. As we want to use arch_remove_memory() in case something goes wrong during memory hotplug after arch_add_memory() finished, let's add a temporary hack that is sufficient enough until we get a proper implementation that cleans up page table entries. We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up patches. Link: http://lkml.kernel.org/r/20190527111152.16324-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18s390x/mm: implement arch_remove_memory()David Hildenbrand
Will come in handy when wanting to handle errors after arch_add_memory(). Link: http://lkml.kernel.org/r/20190527111152.16324-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18s390x/mm: fail when an altmap is used for arch_add_memory()David Hildenbrand
ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we don't forget arch_add_memory()/arch_remove_memory() when unlocking support. Link: http://lkml.kernel.org/r/20190527111152.16324-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RTThomas Gleixner
Add a new entry to the preemption menu which enables the real-time support for the kernel. The choice is only enabled when an architecture supports it. It selects PREEMPT as the RT features depend on it. To achieve that the existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as well. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Clark Williams <williams@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Daniel Wagner <wagi@monom.org> Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Julia Cartwright <julia@ni.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Acked-by: Sebastian Siewior <bigeasy@linutronix.de> Cc: Andrew Morton <akpm@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907172200190.1778@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-18x86, boot: Remove multiple copy of static function sanitize_boot_params()Zhenzhong Duan
Kernel build warns: 'sanitize_boot_params' defined but not used [-Wunused-function] at below files: arch/x86/boot/compressed/cmdline.c arch/x86/boot/compressed/error.c arch/x86/boot/compressed/early_serial_console.c arch/x86/boot/compressed/acpi.c That's becausethey each include misc.h which includes a definition of sanitize_boot_params() via bootparam_utils.h. Remove the inclusion from misc.h and have the c file including bootparam_utils.h directly. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-18x86/boot/compressed/64: Remove unused variableZhenzhong Duan
Fix gcc warning: arch/x86/boot/compressed/pgtable_64.c: In function 'find_trampoline_placement': arch/x86/boot/compressed/pgtable_64.c:43:16: warning: unused variable 'trampoline_start' [-Wunused-variable] unsigned long trampoline_start; ^ Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/1563283040-31101-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-18x86/boot/efi: Remove unused variablesZhenzhong Duan
Fix gcc warnings: arch/x86/boot/compressed/eboot.c: In function 'make_boot_params': arch/x86/boot/compressed/eboot.c:394:6: warning: unused variable 'i' [-Wunused-variable] int i; ^ arch/x86/boot/compressed/eboot.c:393:6: warning: unused variable 's1' [-Wunused-variable] u8 *s1; ^ arch/x86/boot/compressed/eboot.c:392:7: warning: unused variable 's2' [-Wunused-variable] u16 *s2; ^ arch/x86/boot/compressed/eboot.c:387:8: warning: unused variable 'options' [-Wunused-variable] void *options, *handle; ^ arch/x86/boot/compressed/eboot.c: In function 'add_e820ext': arch/x86/boot/compressed/eboot.c:498:16: warning: unused variable 'size' [-Wunused-variable] unsigned long size; ^ arch/x86/boot/compressed/eboot.c:497:15: warning: unused variable 'status' [-Wunused-variable] efi_status_t status; ^ arch/x86/boot/compressed/eboot.c: In function 'exit_boot_func': arch/x86/boot/compressed/eboot.c:681:15: warning: unused variable 'status' [-Wunused-variable] efi_status_t status; ^ arch/x86/boot/compressed/eboot.c:680:8: warning: unused variable 'nr_desc' [-Wunused-variable] __u32 nr_desc; ^ arch/x86/boot/compressed/eboot.c: In function 'efi_main': arch/x86/boot/compressed/eboot.c:750:22: warning: unused variable 'image' [-Wunused-variable] efi_loaded_image_t *image; ^ Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563282957-26898-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-18Merge tag 'riscv/for-v5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: - Hugepage support - "Image" header support for RISC-V kernel binaries, compatible with the current ARM64 "Image" header - Initial page table setup now split into two stages - CONFIG_SOC support (starting with SiFive SoCs) - Avoid reserving memory between RAM start and the kernel in setup_bootmem() - Enable high-res timers and dynamic tick in the RV64 defconfig - Remove long-deprecated gate area stubs - MAINTAINERS updates to switch to the newly-created shared RISC-V git tree, and to fix a get_maintainers.pl issue for patches involving SiFive E-mail addresses Also, one integration fix to resolve a build problem introduced during in the v5.3-rc1 merge window: - Fix build break after macro-to-function conversion in asm-generic/cacheflush.h * tag 'riscv/for-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: fix build break after macro-to-function conversion in generic cacheflush.h RISC-V: Add an Image header that boot loader can parse. RISC-V: Setup initial page tables in two stages riscv: remove free_initrd_mem riscv: ccache: Remove unused variable riscv: Introduce huge page support for 32/64bit kernel x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig RISC-V: Fix memory reservation in setup_bootmem() riscv: defconfig: enable SOC_SIFIVE riscv: select SiFive platform drivers with SOC_SIFIVE arch: riscv: add config option for building SiFive's SoC resource riscv: Remove gate area stubs MAINTAINERS: change the arch/riscv git tree to the new shared tree MAINTAINERS: don't automatically patches involving SiFive to the linux-riscv list RISC-V: defconfig: Enable NO_HZ_IDLE and HIGH_RES_TIMERS
2019-07-18Merge branch 'parisc-5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - Prevent kernel panics by adding proper checking of register values injected via the ptrace interface - Wire up the new clone3 syscall * 'parisc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Wire up clone3 syscall parisc: Avoid kernel panic triggered by invalid kprobe parisc: Ensure userspace privilege for ptraced processes in regset functions parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
2019-07-18Merge tag 'modules-for-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "Summary of modules changes for the 5.3 merge window: - Code fixes and cleanups - Fix bug where set_memory_x() wasn't being called when rodata=n - Fix bug where -EEXIST was being returned for going modules - Allow arches to override module_exit_section()" * tag 'modules-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: modules: fix compile error if don't have strict module rwx ARM: module: recognize unwind exit sections module: allow arch overrides for .exit section names modules: fix BUG when load module with rodata=n kernel/module: Fix mem leak in module_add_modinfo_attrs kernel: module: Use struct_size() helper kernel/module.c: Only return -EEXIST for modules that have finished loading
2019-07-18x86/uaccess: Remove redundant CLACs in getuser/putuser error pathsJosh Poimboeuf
The same getuser/putuser error paths are used regardless of whether AC is set. In non-exception failure cases, this results in an unnecessary CLAC. Fixes the following warnings: arch/x86/lib/getuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable arch/x86/lib/putuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/bc14ded2755ae75bd9010c446079e113dbddb74b.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()Josh Poimboeuf
After adding mcsafe_handle_tail() to the objtool uaccess safe list, objtool reports: arch/x86/lib/usercopy_64.o: warning: objtool: mcsafe_handle_tail()+0x0: call to __fentry__() with UACCESS enabled With SMAP, this function is called with AC=1, so it needs to be careful about which functions it calls. Disable the ftrace entry hook, which can potentially pull in a lot of extra code. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/8e13d6f0da1c8a3f7603903da6cbf6d582bbfe10.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()Josh Poimboeuf
After an objtool improvement, it's complaining about the CLAC in copy_user_handle_tail(): arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x6: (alt) arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x2: (alt) arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x0: <=== (func) copy_user_handle_tail() is incorrectly marked as a callable function, so objtool is rightfully concerned about the CLAC with no corresponding STAC. Remove the ELF function annotation. The copy_user_handle_tail() code path is already verified by objtool because it's jumped to by other callable asm code (which does the corresponding STAC). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/head/64: Annotate start_cpu0() as non-callableJosh Poimboeuf
After an objtool improvement, it complains about the fact that start_cpu0() jumps to code which has an LRET instruction. arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function Technically, start_cpu0() is callable, but it acts nothing like a callable function. Prevent objtool from treating it like one by removing its ELF function annotation. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/entry: Fix thunk function ELF sizesJosh Poimboeuf
Fix the following warnings: arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/kvm: Don't call kvm_spurious_fault() from .fixupJosh Poimboeuf
After making a change to improve objtool's sibling call detection, it started showing the following warning: arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame The problem is the ____kvm_handle_fault_on_reboot() macro. It does a fake call by pushing a fake RIP and doing a jump. That tricks the unwinder into printing the function which triggered the exception, rather than the .fixup code. Instead of the hack to make it look like the original function made the call, just change the macro so that the original function actually does make the call. This allows removal of the hack, and also makes objtool happy. I triggered a vmx instruction exception and verified that the stack trace is still sane: kernel BUG at arch/x86/kvm/x86.c:358! invalid opcode: 0000 [#1] SMP PTI CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16 Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017 RIP: 0010:kvm_spurious_fault+0x5/0x10 Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246 RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000 RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000 R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0 R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000 FS: 00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: loaded_vmcs_init+0x4f/0xe0 alloc_loaded_vmcs+0x38/0xd0 vmx_create_vcpu+0xf7/0x600 kvm_vm_ioctl+0x5e9/0x980 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? free_one_page+0x13f/0x4e0 do_vfs_ioctl+0xa4/0x630 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x1c0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fa349b1ee5b Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2Josh Poimboeuf
Objtool reports the following: arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_vmenter()+0x14: call without frame pointer save/setup But frame pointers are necessarily broken anyway, because __vmx_vcpu_run() clobbers RBP with the guest's value before calling vmx_vmenter(). So calling without a frame pointer doesn't make things any worse. Make objtool happy by changing the call to a UD2. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/9fc2216c9dc972f95bb65ce2966a682c6bda1cb0.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/kvm: Fix fastop function ELF metadataJosh Poimboeuf
Some of the fastop functions, e.g. em_setcc(), are actually just used as global labels which point to blocks of functions. The global labels are incorrectly annotated as functions. Also the functions themselves don't have size annotations. Fixes a bunch of warnings like the following: arch/x86/kvm/emulate.o: warning: objtool: seto() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: em_setcc() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: setno() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: setc() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/c8cc9be60ebbceb3092aa5dd91916039a1f88275.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/paravirt: Fix callee-saved function ELF sizesJosh Poimboeuf
The __raw_callee_save_*() functions have an ELF symbol size of zero, which confuses objtool and other tools. Fixes a bunch of warnings like the following: arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
2019-07-18Merge tag 'trace-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The main changes in this release include: - Add user space specific memory reading for kprobes - Allow kprobes to be executed earlier in boot The rest are mostly just various clean ups and small fixes" * tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Make trace_get_fields() global tracing: Let filter_assign_type() detect FILTER_PTR_STRING tracing: Pass type into tracing_generic_entry_update() ftrace/selftest: Test if set_event/ftrace_pid exists before writing ftrace/selftests: Return the skip code when tracing directory not configured in kernel tracing/kprobe: Check registered state using kprobe tracing/probe: Add trace_event_call accesses APIs tracing/probe: Add probe event name and group name accesses APIs tracing/probe: Add trace flag access APIs for trace_probe tracing/probe: Add trace_event_file access APIs for trace_probe tracing/probe: Add trace_event_call register API for trace_probe tracing/probe: Add trace_probe init and free functions tracing/uprobe: Set print format when parsing command tracing/kprobe: Set print format right after parsed command kprobes: Fix to init kprobes in subsys_initcall tracepoint: Use struct_size() in kmalloc() ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS ftrace: Enable trampoline when rec count returns back to one tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline tracing: Make a separate config for trace event self tests ...
2019-07-18riscv: fix build break after macro-to-function conversion in generic ↵Paul Walmsley
cacheflush.h Commit c296d4dc13ae ("asm-generic: fix a compilation warning") converted the various flush_*cache_* macros in asm-generic/cacheflush.h to static inline functions. This breaks RISC-V builds, since RISC-V's cacheflush.h includes the generic cacheflush.h and then undefines the macros to be overridden. Fix by copying the subset of the no-op functions that are reused from the generic cacheflush.h into the RISC-V cacheflush.h, and dropping the include of the generic cacheflush.h. Fixes: c296d4dc13ae ("asm-generic: fix a compilation warning") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Qian Cai <cai@lca.pw> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()Gautham R. Shenoy
xive_find_target_in_mask() has the following for(;;) loop which has a bug when @first == cpumask_first(@mask) and condition 1 fails to hold for every CPU in @mask. In this case we loop forever in the for-loop. first = cpu; for (;;) { if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1 return cpu; cpu = cpumask_next(cpu, mask); if (cpu == first) // condition 2 break; if (cpu >= nr_cpu_ids) // condition 3 cpu = cpumask_first(mask); } This is because, when @first == cpumask_first(@mask), we never hit the condition 2 (cpu == first) since prior to this check, we would have executed "cpu = cpumask_next(cpu, mask)" which will set the value of @cpu to a value greater than @first or to nr_cpus_ids. When this is coupled with the fact that condition 1 is not met, we will never exit this loop. This was discovered by the hard-lockup detector while running LTP test concurrently with SMT switch tests. watchdog: CPU 12 detected hard LOCKUP on other CPUs 68 watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago) watchdog: CPU 68 Hard LOCKUP watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago) CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1 NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000 REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le) MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000 CFAR: c0000000006f558c IRQMASK: 1 GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18 GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8 GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8 GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001 GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18 GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0 GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f NIP [c0000000006f5578] find_next_bit+0x38/0x90 LR [c000000000cba9ec] cpumask_next+0x2c/0x50 Call Trace: [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable) [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240 [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0 [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260 [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0 [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0 [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90 [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220 [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x] [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x] [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0 [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50 [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0 [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0 [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70 [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160 [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70 Instruction dump: 78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039 4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040 To fix this, move the check for condition 2 after the check for condition 3, so that we are able to break out of the loop soon after iterating through all the CPUs in the @mask in the problem case. Use do..while() to achieve this. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Reported-by: Indira P. Joga <indira.priya@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
2019-07-17x86/mm, tracing: Fix CR2 corruptionPeter Zijlstra
Despite the current efforts to read CR2 before tracing happens there still exist a number of possible holes: idtentry page_fault do_page_fault has_error_code=1 call error_entry TRACE_IRQS_OFF call trace_hardirqs_off* #PF // modifies CR2 CALL_enter_from_user_mode __context_tracking_exit() trace_user_exit(0) #PF // modifies CR2 call do_page_fault address = read_cr2(); /* whoopsie */ And similar for i386. Fix it by pulling the CR2 read into the entry code, before any of that stuff gets a chance to run and ruin things. Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: joel@joelfernandes.org Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org Debugged-by: Steven Rostedt <rostedt@goodmis.org>
2019-07-17x86/entry/64: Update comments and sanity tests for create_gapPeter Zijlstra
Commit 2700fefdb2d9 ("x86_64: Add gap to int3 to allow for call emulation") forgot to update the comment, do so now. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114336.059780563@infradead.org
2019-07-17x86/entry/64: Simplify idtentry a littlePeter Zijlstra
There's a bunch of duplication in idtentry, namely the .Lfrom_usermode_switch_stack is a paranoid=0 copy of the normal flow. Make this explicit by creating a idtentry_part helper macro. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114336.002429503@infradead.org
2019-07-17x86/entry/32: Simplify common_exceptionPeter Zijlstra
Adding one more option to SAVE_ALL can be used in common_exception to simplify things. This also saves duplication later where page_fault will no longer use common_exception. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.945136187@infradead.org
2019-07-17x86/paravirt: Make read_cr2() CALLEE_SAVEPeter Zijlstra
The one paravirt read_cr2() implementation (Xen) is actually quite trivial and doesn't need to clobber anything other than the return register. Making read_cr2() CALLEE_SAVE avoids all the PUSH/POP nonsense and allows more convenient use from assembly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.887392493@infradead.org
2019-07-17parisc: Wire up clone3 syscallHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: Sven Schnelle <svens@stackframe.org> Acked-by: Christian Brauner <christian@brauner.io>
2019-07-17parisc: Avoid kernel panic triggered by invalid kprobeHelge Deller
When running gdb I was able to trigger this kernel panic: Kernel Fault: Code=26 (Data memory access rights trap) at addr 0000000000000060 CPU: 0 PID: 1401 Comm: gdb-crash Not tainted 5.2.0-rc7-64bit+ #1053 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001000000000000001111 Not tainted r00-03 000000000804000f 0000000040dee1a0 0000000040c78cf0 00000000b8d50160 r04-07 0000000040d2b1a0 000000004360a098 00000000bbbe87b8 0000000000000003 r08-11 00000000fac20a70 00000000fac24160 00000000fac1bbe0 0000000000000000 r12-15 00000000fabfb79a 00000000fac244a4 0000000000010000 0000000000000001 r16-19 00000000bbbe87b8 00000000f8f02910 0000000000010034 0000000000000000 r20-23 00000000fac24630 00000000fac24630 000000006474e552 00000000fac1aa52 r24-27 0000000000000028 00000000bbbe87b8 00000000bbbe87b8 0000000040d2b1a0 r28-31 0000000000000000 00000000b8d501c0 00000000b8d501f0 0000000003424000 sr00-03 0000000000423000 0000000000000000 0000000000000000 0000000000423000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040c78cf0 0000000040c78cf4 IIR: 539f00c0 ISR: 0000000000000000 IOR: 0000000000000060 CPU: 0 CR30: 00000000b8d50000 CR31: 00000000d22345e2 ORIG_R28: 0000000040250798 IAOQ[0]: parisc_kprobe_ss_handler+0x58/0x170 IAOQ[1]: parisc_kprobe_ss_handler+0x5c/0x170 RP(r2): parisc_kprobe_ss_handler+0x58/0x170 Backtrace: [<0000000040206ff8>] handle_interruption+0x178/0xbb8 Kernel panic - not syncing: Kernel Fault Avoid this panic by checking the return value of kprobe_running() and skip kprobe if none is currently active. Cc: <stable@vger.kernel.org> # v5.2 Acked-by: Sven Schnelle <svens@stackframe.org> Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-17parisc: Ensure userspace privilege for ptraced processes in regset functionsHelge Deller
On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications in the regset support functions by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace regset calls. Link: https://bugs.gentoo.org/481768 Cc: <stable@vger.kernel.org> # v4.7+ Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-17parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1Helge Deller
On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace calls in the native and compat ptrace paths. Link: https://bugs.gentoo.org/481768 Reported-by: Jeroen Roovers <jer@gentoo.org> Cc: <stable@vger.kernel.org> Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>