summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2016-02-18mm/core, x86/mm/pkeys: Differentiate instruction fetchesDave Hansen
As discussed earlier, we attempt to enforce protection keys in software. However, the code checks all faults to ensure that they are not violating protection key permissions. It was assumed that all faults are either write faults where we check PKRU[key].WD (write disable) or read faults where we check the AD (access disable) bit. But, there is a third category of faults for protection keys: instruction faults. Instruction faults never run afoul of protection keys because they do not affect instruction fetches. So, plumb the PF_INSTR bit down in to the arch_vma_access_permitted() function where we do the protection key checks. We also add a new FAULT_FLAG_INSTRUCTION. This is because handle_mm_fault() is not passed the architecture-specific error_code where we keep PF_INSTR, so we need to encode the instruction fetch information in to the arch-generic fault flags. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18mm/core: Do not enforce PKEY permissions on remote mm accessDave Hansen
We try to enforce protection keys in software the same way that we do in hardware. (See long example below). But, we only want to do this when accessing our *own* process's memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then tried to PTRACE_POKE a target process which just happened to have some mprotect_pkey(pkey=6) memory, we do *not* want to deny the debugger access to that memory. PKRU is fundamentally a thread-local structure and we do not want to enforce it on access to _another_ thread's data. This gets especially tricky when we have workqueues or other delayed-work mechanisms that might run in a random process's context. We can check that we only enforce pkeys when operating on our *own* mm, but delayed work gets performed when a random user context is active. We might end up with a situation where a delayed-work gup fails when running randomly under its "own" task but succeeds when running under another process. We want to avoid that. To avoid that, we use the new GUP flag: FOLL_REMOTE and add a fault flag: FAULT_FLAG_REMOTE. They indicate that we are walking an mm which is not guranteed to be the same as current->mm and should not be subject to protection key enforcement. Thanks to Jerome Glisse for pointing out this scenario. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Dominik Vogt <vogt@linux.vnet.ibm.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Geliang Tang <geliangtang@163.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Low <jason.low2@hp.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: iommu@lists.linux-foundation.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keysDave Hansen
Today, for normal faults and page table walks, we check the VMA and/or PTE to ensure that it is compatible with the action. For instance, if we get a write fault on a non-writeable VMA, we SIGSEGV. We try to do the same thing for protection keys. Basically, we try to make sure that if a user does this: mprotect(ptr, size, PROT_NONE); *ptr = foo; they see the same effects with protection keys when they do this: mprotect(ptr, size, PROT_READ|PROT_WRITE); set_pkey(ptr, size, 4); wrpkru(0xffffff3f); // access disable pkey 4 *ptr = foo; The state to do that checking is in the VMA, but we also sometimes have to do it on the page tables only, like when doing a get_user_pages_fast() where we have no VMA. We add two functions and expose them to generic code: arch_pte_access_permitted(pte_flags, write) arch_vma_access_permitted(vma, write) These are, of course, backed up in x86 arch code with checks against the PTE or VMA's protection key. But, there are also cases where we do not want to respect protection keys. When we ptrace(), for instance, we do not want to apply the tracer's PKRU permissions to the PTEs from the process being traced. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Dominik Vogt <vogt@linux.vnet.ibm.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Low <jason.low2@hp.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18x86/mm/pkeys: Add functions to fetch PKRUDave Hansen
This adds the raw instruction to access PKRU as well as some accessor functions that correctly handle when the CPU does not support the instruction. We don't use it here, but we will use read_pkru() in the next patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18x86/mm/pkeys: Fill in pkey field in siginfoDave Hansen
This fills in the new siginfo field: si_pkey to indicate to userspace which protection key was set on the PTE that we faulted on. Note though that *ALL* protection key faults have to be generated by a valid, present PTE at some point. But this code does no PTE lookups which seeds odd. The reason is that we take advantage of the way we generate PTEs from VMAs. All PTEs under a VMA share some attributes. For instance, they are _all_ either PROT_READ *OR* PROT_NONE. They also always share a protection key, so we never have to walk the page tables; we just use the VMA. Note that _pkey is a 64-bit value. The current hardware only supports 4-bit protection keys. We do this because there is _plenty_ of space in _sigfault and it is possible that future processors would support more than 4 bits of protection keys. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18x86/mm/pkeys: Add arch-specific VMA protection bitsDave Hansen
Lots of things seem to do: vma->vm_page_prot = vm_get_page_prot(flags); and the ptes get created right from things we pull out of ->vm_page_prot. So it is very convenient if we can store the protection key in flags and vm_page_prot, just like the existing permission bits (_PAGE_RW/PRESENT). It greatly reduces the amount of plumbing and arch-specific hacking we have to do in generic code. This also takes the new PROT_PKEY{0,1,2,3} flags and turns *those* in to VM_ flags for vma->vm_flags. The protection key values are stored in 4 places: 1. "prot" argument to system calls 2. vma->vm_flags, filled from the mmap "prot" 3. vma->vm_page prot, filled from vma->vm_flags 4. the PTE itself. The pseudocode for these for steps are as follows: mmap(PROT_PKEY*) vma->vm_flags = ... | arch_calc_vm_prot_bits(mmap_prot); vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags); pte = pfn | vma->vm_page_prot Note that this provides a new definitions for x86: arch_vm_get_page_prot() Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18x86/mm/pkeys: Add PTE bits for storing protection keyDave Hansen
Previous documentation has referred to these 4 bits as "ignored". That means that software could have made use of them. But, as far as I know, the kernel never used them. They are still ignored when protection keys is not enabled, so they could theoretically still get used for software purposes. We also implement "empty" versions so that code that references to them can be optimized away by the compiler when the config option is not enabled. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210205.81E33ED6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18x86/cpufeature: Create a new synthetic cpu capability for machine check recoveryTony Luck
The Intel Software Developer Manual describes bit 24 in the MCG_CAP MSR: MCG_SER_P (software error recovery support present) flag, bit 24 — Indicates (when set) that the processor supports software error recovery But only some models with this capability bit set will actually generate recoverable machine checks. Check the model name and set a synthetic capability bit. Provide a command line option to set this bit anyway in case the kernel doesn't recognise the model name. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18Merge branch 'x86/urgent' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18x86/mm: Expand the exception table logic to allow new handling optionsTony Luck
Huge amounts of help from Andy Lutomirski and Borislav Petkov to produce this. Andy provided the inspiration to add classes to the exception table with a clever bit-squeezing trick, Boris pointed out how much cleaner it would all be if we just had a new field. Linus Torvalds blessed the expansion with: ' I'd rather not be clever in order to save just a tiny amount of space in the exception table, which isn't really criticial for anybody. ' The third field is another relative function pointer, this one to a handler that executes the actions. We start out with three handlers: 1: Legacy - just jumps the to fixup IP 2: Fault - provide the trap number in %ax to the fixup code 3: Cleaned up legacy for the uaccess error hack Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8.1455732970.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18Merge tag 'v4.5-rc4' into ras/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17Merge branch 'perf/urgent' into perf/core, to queue up dependent patchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86/msr: Document msr-index.h rule for additionBorislav Petkov
In order to keep this file's size sensible and not cause too much unnecessary churn, make the rule explicit - similar to pci_ids.h - that only MSRs which are used in multiple compilation units, should get added to it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alex.williamson@redhat.com Cc: gleb@kernel.org Cc: joro@8bytes.org Cc: kvm@vger.kernel.org Cc: sherry.hurwitz@amd.com Cc: wei@redhat.com Link: http://lkml.kernel.org/r/1455612202-14414-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86/signal/64: Re-add support for SS in the 64-bit signal contextAndy Lutomirski
This is a second attempt to make the improvements from c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add support for SS in the 64-bit signal context"). This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for all 64-bit signals (including x32). It indicates that the saved SS field is valid and that the kernel supports the new behavior. The goal is to fix a problems with signal handling in 64-bit tasks: SS wasn't saved in the 64-bit signal context, making it awkward to determine what SS was at the time of signal delivery and making it impossible to return to a non-flat SS (as calling sigreturn clobbers SS). This also made it extremely difficult for 64-bit tasks to return to fully-defined 16-bit contexts, because only the kernel can easily do espfix64, but sigreturn was unable to set a non-flag SS:ESP. (DOSEMU has a monstrous hack to partially work around this limitation.) If we could go back in time, the correct fix would be to make 64-bit signals work just like 32-bit signals with respect to SS: save it in signal context, reset it when delivering a signal, and restore it in sigreturn. Unfortunately, doing that (as I tried originally) breaks DOSEMU: DOSEMU wouldn't reset the signal context's SS when clearing the LDT and changing the saved CS to 64-bit mode, since it predates the SS context field existing in the first place. This patch is a bit more complicated, and it tries to balance a bunch of goals. It makes most cases of changing ucontext->ss during signal handling work as expected. I do this by special-casing the interesting case. On sigreturn, ucontext->ss will be honored by default, unless the ucontext was created from scratch by an old program and had a 64-bit CS (unfortunately, CRIU can do this) or was the result of changing a 32-bit signal context to 64-bit without resetting SS (as DOSEMU does). For the benefit of new 64-bit software that uses segmentation (new versions of DOSEMU might), the new behavior can be detected with a new ucontext flag UC_SIGCONTEXT_SS. To avoid compilation issues, __pad0 is left as an alias for ss in ucontext. The nitty-gritty details are documented in the header file. This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests, as the kernel change allows both of them to pass. Tested-by: Stas Sergeev <stsp@list.ru> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org [ Small readability edit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86/signal/64: Fix SS if needed when delivering a 64-bit signalAndy Lutomirski
Signals are always delivered to 64-bit tasks with CS set to a long mode segment. In long mode, SS doesn't matter as long as it's a present writable segment. If SS starts out invalid (this can happen if the signal was caused by an IRET fault or was delivered on the way out of set_thread_area or modify_ldt), then IRET to the signal handler can fail, eventually killing the task. The straightforward fix would be to simply reset SS when delivering a signal. That breaks DOSEMU, though: 64-bit builds of DOSEMU rely on SS being set to the faulting SS when signals are delivered. As a compromise, this patch leaves SS alone so long as it's valid. The net effect should be that the behavior of successfully delivered signals is unchanged. Some signals that would previously have failed to be delivered will now be delivered successfully. This has no effect for x32 or 32-bit tasks: their signal handlers were already called with SS == __USER_DS. (On Xen, there's a slight hole: if a task sets SS to a writable *kernel* data segment, then we will fail to identify it as invalid and we'll still kill the task. If anyone cares, this could be fixed with a new paravirt hook.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/163c6e1eacde41388f3ff4d2fe6769be651d7b6e.1455664054.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86/signal/64: Add a comment about sigcontext->fs and gsAndy Lutomirski
These fields have a strange history. This tries to document it. This borrows from 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs' from sigcontext"), which was reverted by ed596cde9425 ("Revert x86 sigcontext cleanups"). Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/baa78f3c84106fa5acbc319377b1850602f5deec.1455664054.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86 msr-index: Simplify syntax for HWP fieldsLen Brown
syntax only, no functional change Signed-off-by: Len Brown <len.brown@intel.com>
2016-02-16PCI: Add fwnode_handle to x86 pci_sysdataJake Oshins
Add an fwnode_handle to the x86 struct pci_sysdata, which will be used to locate an IRQ domain associated with a root PCI bus. [bhelgaas: changelog] Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-02-16drivers/hv: Move VMBus hypercall codes into Hyper-V UAPI headerAndrey Smetanin
VMBus hypercall codes inside Hyper-V UAPI header will be used by QEMU to implement VMBus host devices support. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org [Do not rename the constant at the same time as moving it, as that would cause semantic conflicts with the Hyper-V tree. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16kvm/x86: Rename Hyper-V long spin wait hypercallAndrey Smetanin
Rename HV_X64_HV_NOTIFY_LONG_SPIN_WAIT by HVCALL_NOTIFY_LONG_SPIN_WAIT, so the name is more consistent with the other hypercalls. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org [Change name, Andrey used HV_X64_HCALL_NOTIFY_LONG_SPIN_WAIT. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structuresDave Hansen
The protection keys register (PKRU) is saved and restored using xsave. Define the data structure that we will use to access it inside the xsave buffer. Note that we also have to widen the printk of the xsave feature masks since this is feature 0x200 and we only did two characters before. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16x86/cpu, x86/mm/pkeys: Define new CR4 bitDave Hansen
There is a new bit in CR4 for enabling protection keys. We will actually enable it later in the series. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210202.3CFC3DB2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitionsDave Hansen
There are two CPUID bits for protection keys. One is for whether the CPU contains the feature, and the other will appear set once the OS enables protection keys. Specifically: Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable Protection keys (and the RDPKRU/WRPKRU instructions) This is because userspace can not see CR4 contents, but it can see CPUID contents. X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation: CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3] X86_FEATURE_OSPKE is "OSPKU": CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4] These are the first CPU features which need to look at the ECX word in CPUID leaf 0x7, so this patch also includes fetching that word in to the cpuinfo->x86_capability[] array. Add it to the disabled-features mask when its config option is off. Even though we are not using it here, we also extend the REQUIRED_MASK_BIT_SET() macro to keep it mirroring the DISABLED_MASK_BIT_SET() version. This means that in almost all code, you should use: cpu_has(c, X86_FEATURE_PKU) and *not* the CONFIG option. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16x86/fpu: Add placeholder for 'Processor Trace' XSAVE stateDave Hansen
There is an XSAVE state component for Intel Processor Trace (PT). But, we do not currently use it. We add a placeholder in the code for it so it is not a mystery and also so we do not need an explicit enum initialization for Protection Keys in a moment. Why don't we use it? We might end up using this at _some_ point in the future. But, this is a "system" state which requires using the currently unsupported XSAVES feature. Unlike all the other XSAVE states, PT state is also not directly tied to a thread. You might context-switch between threads, but not want to change any of the PT state. Or, you might switch between threads, and *do* want to change PT state, all depending on what is being traced. We currently just manually set some MSRs to do this PT context switching, and it is unclear whether replacing our direct MSR use with XSAVE will be a net win or loss, both in code complexity and performance. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: fenghua.yu@intel.com Cc: linux-mm@kvack.org Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16Merge branches 'x86/fpu', 'x86/mm' and 'x86/asm' into x86/pkeysIngo Molnar
Provide a stable basis for the pkeys patches, which touches various x86 details. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16x86/cpufeature: Speed up cpu_feature_enabled()Borislav Petkov
When GCC cannot do constant folding for this macro, it falls back to cpu_has(). But static_cpu_has() is optimal and it works at all times now. So use it and speedup the fallback case. Before we had this: mov 0x99d674(%rip),%rdx # ffffffff81b0d9f4 <boot_cpu_data+0x34> shr $0x2e,%rdx and $0x1,%edx jne ffffffff811704e9 <do_munmap+0x3f9> After alternatives patching, it turns into: jmp 0xffffffff81170390 nopl (%rax) ... callq ffffffff81056e00 <mpx_notify_unmap> ffffffff81170390: mov 0x170(%r12),%rdi Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1455578358-28347-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16gpio: Remove unused asm/gpio.h filesBjorn Helgaas
asm/gpio.h is included only by linux/gpio.h, and then only when the arch selects ARCH_HAVE_CUSTOM_GPIO_H. Only the following arches select it: arm avr32 blackfin m68k (COLDFIRE only) sh unicore32. Remove the unused asm/gpio.h files for the arches that do not select ARCH_HAVE_CUSTOM_GPIO_H. This is a follow-on to 7563bbf89d06 ("gpiolib/arches: Centralise bolierplate asm/gpio.h"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-02-15xen/pcifront: Report the errors better.Konrad Rzeszutek Wilk
The messages should be different depending on the type of error. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-02-14x86/mm: Fix INVPCID asm constraintBorislav Petkov
So we want to specify the dependency on both @pcid and @addr so that the compiler doesn't reorder accesses to them *before* the TLB flush. But for that to work, we need to express this properly in the inline asm and deref the whole desc array, not the pointer to it. See clwb() for an example. This fixes the build error on 32-bit: arch/x86/include/asm/tlbflush.h: In function ‘__invpcid’: arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable which gcc4.7 caught but 5.x didn't. Which is strange. :-\ Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Michael Matz <matz@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/fpu: Fix math emulation in eager fpu modeAndy Lutomirski
Systems without an FPU are generally old and therefore use lazy FPU switching. Unsurprisingly, math emulation in eager FPU mode is a bit buggy. Fix it. There were two bugs involving kernel code trying to use the FPU registers in eager mode even if they didn't exist and one BUG_ON() that was incorrect. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()Andy Lutomirski
DMI cacheability is very confused on x86. dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO, which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached. Don't ask me why this makes any sense. dmi_remap() uses ioremap(), which requests an uncached mapping. However, on non-EFI systems, the DMI data generally lives between 0xf0000 and 0x100000, which is in the legacy ISA range, which triggers a special case in the PAT code that overrides the cache mode requested by ioremap() and forces a WB mapping. On a UEFI boot, however, the DMI table can live at any physical address. On my laptop, it's around 0x77dd0000. That's nowhere near the legacy ISA range, so the ioremap() implicit uncached type is honored and we end up with a UC- mapping. UC- is a very, very slow way to read from main memory, so dmi_walk() is likely to take much longer than necessary. Given that, even on UEFI, we do early cached DMI reads, it seems safe to just ask for cached access. Switch to ioremap_cache(). I haven't tried to benchmark this, but I'd guess it saves several milliseconds of boot time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean Delvare <jdelvare@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: If INVPCID is available, use it to flush global mappingsAndy Lutomirski
On my Skylake laptop, INVPCID function 2 (flush absolutely everything) takes about 376ns, whereas saving flags, twiddling CR4.PGE to flush global mappings, and restoring flags takes about 539ns. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Add INVPCID helpersAndy Lutomirski
This adds helpers for each of the four currently-specified INVPCID modes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09KVM: x86: Use vector-hashing to deliver lowest-priority interruptsFeng Wu
Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09x86/microcode: Untangle from BLK_DEV_INITRDBorislav Petkov
Thomas Voegtle reported that doing oldconfig with a .config which has CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the microcode loading mechanism from being built. So untangle it from the BLK_DEV_INITRD dependency so that oldconfig doesn't turn it off and add an explanatory text to its Kconfig help what the supported methods for supplying microcode are. Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.4 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/asm/bitops: Force inlining of test_and_set_bit and friendsDenys Vlasenko
Sometimes GCC mysteriously doesn't inline very small functions we expect to be inlined, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Arguably, GCC should do better, but GCC people aren't willing to invest time into it and are asking to use __always_inline instead. With this .config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os here's an example of functions getting deinlined many times: test_and_set_bit (166 copies, ~1260 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f ab 3e lock bts %rdi,(%rsi) 72 04 jb <test_and_set_bit+0xf> 31 c0 xor %eax,%eax eb 05 jmp <test_and_set_bit+0x14> b8 01 00 00 00 mov $0x1,%eax 5d pop %rbp c3 retq test_and_clear_bit (124 copies, ~1000 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f b3 3e lock btr %rdi,(%rsi) 72 04 jb <test_and_clear_bit+0xf> 31 c0 xor %eax,%eax eb 05 jmp <test_and_clear_bit+0x14> b8 01 00 00 00 mov $0x1,%eax 5d pop %rbp c3 retq change_bit (3 copies, 8 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f bb 3e lock btc %rdi,(%rsi) 5d pop %rbp c3 retq clear_bit_unlock (2 copies, 11 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f b3 3e lock btr %rdi,(%rsi) 5d pop %rbp c3 retq This patch works it around via s/inline/__always_inline/. Code size decrease by ~13.5k after the patch: text data bss dec filename 92110727 20826144 36417536 149354407 vmlinux.before 92097234 20826176 36417536 149340946 vmlinux.after Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Graf <tgraf@suug.ch> Link: http://lkml.kernel.org/r/1454881887-1367-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09Merge tag 'v4.5-rc3' into locking/core, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-08Merge branch 'x86/urgent' into x86/mm, to pick up dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-05x86: Fix KASAN false positives in thread_saved_pc()Dmitry Vyukov
thread_saved_pc() reads stack of a potentially running task. This can cause false KASAN stack-out-of-bounds reports, because the running task concurrently poisons and unpoisons own stack. The same happens in get_wchan(), and get get_wchan() was fixed by using READ_ONCE_NOCHECK(). Do the same here. Example KASAN report triggered by sysrq-t: BUG: KASAN: out-of-bounds in sched_show_task+0x306/0x3b0 at addr ffff880043c97c18 Read of size 8 by task syz-executor/23839 [...] page dumped because: kasan: bad access detected [...] Call Trace: [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 [<ffffffff813e7a26>] sched_show_task+0x306/0x3b0 [<ffffffff813e7bf4>] show_state_filter+0x124/0x1a0 [<ffffffff82d2ca00>] fn_show_state+0x10/0x20 [<ffffffff82d2cf98>] k_spec+0xa8/0xe0 [<ffffffff82d3354f>] kbd_event+0xb9f/0x4000 [<ffffffff843ca8a7>] input_to_handler+0x3a7/0x4b0 [<ffffffff843d1954>] input_pass_values.part.5+0x554/0x6b0 [<ffffffff843d29bc>] input_handle_event+0x2ac/0x1070 [<ffffffff843d3a47>] input_inject_event+0x237/0x280 [<ffffffff843e8c28>] evdev_write+0x478/0x680 [<ffffffff817ac653>] __vfs_write+0x113/0x480 [<ffffffff817ae0e7>] vfs_write+0x167/0x4a0 [<ffffffff817b13d1>] SyS_write+0x111/0x220 Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: glider@google.com Cc: kasan-dev@googlegroups.com Cc: kcc@google.com Cc: linux-kernel@vger.kernel.org Cc: ryabinin.a.a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-03Merge branch 'linus' into efi/core, to refresh the branch and to pick up ↵Ingo Molnar
recent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-01x86/mce/AMD: Set MCAX Enable bitAravind Gopalakrishnan
It is required for the OS to acknowledge that it is using the MCAX register set and its associated fields by setting the 'McaXEnable' bit in each bank's MCi_CONFIG register. If it is not set, then all UC errors will cause a system panic. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1453750913-4781-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-01x86/cpufeature: Use enum cpuid_leafs instead of magic numbersHuaitong Han
Most of the magic numbers in x86_capability[] have been converted to 'enum cpuid_leafs', and this patch updates the remaining part. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1453750913-4781-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A bit on the largish side due to a series of fixes for a regression in the x86 vector management which was introduced in 4.3. This work was started in December already, but it took some time to fix all corner cases and a couple of older bugs in that area which were detected while at it Aside of that a few platform updates for intel-mid, quark and UV and two fixes for in the mm code: - Use proper types for pgprot values to avoid truncation - Prevent a size truncation in the pageattr code when setting page attributes for large mappings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/mm/pat: Avoid truncation when converting cpa->numpages to address x86/mm: Fix types used in pgprot cacheability flags translations x86/platform/quark: Print boundaries correctly x86/platform/UV: Remove EFI memmap quirk for UV2+ x86/platform/intel-mid: Join string and fix SoC name x86/platform/intel-mid: Enable 64-bit build x86/irq: Plug vector cleanup race x86/irq: Call irq_force_move_complete with irq descriptor x86/irq: Remove outgoing CPU from vector cleanup mask x86/irq: Remove the cpumask allocation from send_cleanup_vector() x86/irq: Clear move_in_progress before sending cleanup IPI x86/irq: Remove offline cpus from vector cleanup x86/irq: Get rid of code duplication x86/irq: Copy vectormask instead of an AND operation x86/irq: Check vector allocation early x86/irq: Reorganize the search in assign_irq_vector x86/irq: Reorganize the return path in assign_irq_vector x86/irq: Do not use apic_chip_data.old_domain as temporary buffer x86/irq: Validate that irq descriptor is still active x86/irq: Fix a race in x86_vector_free_irqs() ...
2016-01-30x86/alternatives: Discard dynamic check after initBrian Gerst
Move the code to do the dynamic check to the altinstr_aux section so that it is discarded after alternatives have run and a static branch has been chosen. This way we're changing the dynamic branch from C code to assembly, which makes it *substantially* smaller while avoiding a completely unnecessary call to an out of line function. Signed-off-by: Brian Gerst <brgerst@gmail.com> [ Changed it to do TESTB, as hpa suggested. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452972124-7380-1-git-send-email-brgerst@gmail.com Link: http://lkml.kernel.org/r/20160127084525.GC30712@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Get rid of the non-asm goto variantBorislav Petkov
I can simply quote hpa from the mail: "Get rid of the non-asm goto variant and just fall back to dynamic if asm goto is unavailable. It doesn't make any sense, really, if it is supposed to be safe, and by now the asm goto-capable gcc is in more wide use. (Originally the gcc 3.x fallback to pure dynamic didn't exist, either.)" Booy, am I lazy. Cleanup the whole CC_HAVE_ASM_GOTO ifdeffery too, while at it. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160127084325.GB30712@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Replace the old static_cpu_has() with safe variantBorislav Petkov
So the old one didn't work properly before alternatives had run. And it was supposed to provide an optimized JMP because the assumption was that the offset it is jumping to is within a signed byte and thus a two-byte JMP. So I did an x86_64 allyesconfig build and dumped all possible sites where static_cpu_has() was used. The optimization amounted to all in all 12(!) places where static_cpu_has() had generated a 2-byte JMP. Which has saved us a whopping 36 bytes! This clearly is not worth the trouble so we can remove it. The only place where the optimization might count - in __switch_to() - we will handle differently. But that's not subject of this patch. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30Merge branch 'x86/cpu' into x86/asm, to avoid conflictIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29Merge tag 'v4.5-rc1' into x86/asm, to refresh the branch before merging new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29locking/x86: Tweak the comment about use of wmb() for IOMichael S. Tsirkin
On x86, we *do* still use the non-NOP rmb()/wmb() for IO barriers, but even that is generally questionable. Leave them around as historial unless somebody can point to a case where they care about the performance, but tweak the comment so people don't think they are strictly required in all cases. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization <virtualization@lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-4-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>