summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2017-06-13x86/boot/64: Rename init_level4_pgt and early_level4_pgtKirill A. Shutemov
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementationKirill A. Shutemov
This patch provides all required callbacks required by the generic get_user_pages_fast() code and switches x86 over - and removes the platform specific implementation. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()Andy Lutomirski
The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/time: Make setup_default_timer_irq() staticDou Liyang
This function isn't used outside of time.c, so let's mark it static. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1497321029-29049-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-12x86/debug: Handle early WARN_ONs properPeter Zijlstra
Hans managed to trigger a WARN very early in the boot which killed his (Virtual) box. The reason is that the recent rework of WARN() to use UD0 forgot to add the fixup_bug() call to early_fixup_exception(). As a result the kernel does not handle the WARN_ON injected UD0 exception and panics. Add the missing fixup call, so early UD's injected by WARN() get handled. Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frank Mehnert <frank.mehnert@oracle.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Michael Thayer <michael.thayer@oracle.com> Link: http://lkml.kernel.org/r/20170612180108.w4vgu2ckucmllf3a@hirez.programming.kicks-ass.net
2017-06-12Merge 4.12-rc5 into char-misc-nextGreg Kroah-Hartman
We want the char/misc driver fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass ↵Dan Williams
operations The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-08xen: fix HYPERVISOR_dm_op() prototypeSergey Dyasli
Change the third parameter to be the required struct xen_dm_op_buf * instead of a generic void * (which blindly accepts any pointer). Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-08locking/x86: Remove the unused atomic_inc_short() methdDmitry Vyukov
It is completely unused and implemented only on x86. Remove it. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170526172900.91058-1-dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08x86/ldt: Rename ldt_struct::size to ::nr_entriesBorislav Petkov
... because this is exactly what it is: the number of entries in the LDT. Calling it "size" is simply confusing and it is actually begging to be called "nr_entries" or somesuch, especially if you see constructs like: alloc_size = size * LDT_ENTRY_SIZE; since LDT_ENTRY_SIZE is the size of a single entry. There should be no functionality change resulting from this patch, as the before/after output from tools/testing/selftests/x86/ldt_gdt.c shows. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170606173116.13977-1-bp@alien8.de [ Renamed 'n_entries' to 'nr_entries' ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-07kvm: vmx: Check value written to IA32_BNDCFGSJim Mattson
Bits 11:2 must be zero and the linear addess in bits 63:12 must be canonical. Otherwise, WRMSR(BNDCFGS) should raise #GP. Fixes: 0dd376e709975779 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-06-05x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constantAndy Lutomirski
When PCID is enabled, CR3's PCID bits can change during context switches, so KVM won't be able to treat CR3 as a per-mm constant any more. I structured this like the existing CR4 handling. Under ordinary circumstances (PCID disabled or if the current PCID and the value that's already in the VMCS match), then we won't do an extra VMCS write, and we'll never do an extra direct CR3 read. The overhead should be minimal. I disallowed using the new helper in non-atomic context because PCID support will cause CR3 to stop being constant in non-atomic process context. (Frankly, it also scares me a bit that KVM ever treated CR3 as constant, but it looks like it was okay before.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Rework lazy TLB to track the actual loaded mmAndy Lutomirski
Lazy TLB state is currently managed in a rather baroque manner. AFAICT, there are three possible states: - Non-lazy. This means that we're running a user thread or a kernel thread that has called use_mm(). current->mm == current->active_mm == cpu_tlbstate.active_mm and cpu_tlbstate.state == TLBSTATE_OK. - Lazy with user mm. We're running a kernel thread without an mm and we're borrowing an mm_struct. We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set in mm_cpumask(current->active_mm). CR3 points to current->active_mm->pgd. The TLB is up to date. - Lazy with init_mm. This happens when we call leave_mm(). We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, but that mm is only relelvant insofar as the scheduler is tracking it for refcounting. cpu_tlbstate.state != TLBSTATE_OK. The current cpu is clear in mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir, i.e. init_mm->pgd. This patch simplifies the situation. Other than perf, x86 stops caring about current->active_mm at all. We have cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The TLB is always up to date for that mm. leave_mm() just switches us to init_mm. There are no longer any special cases for mm_cpumask, and switch_mm() switches mms without worrying about laziness. After this patch, cpu_tlbstate.state serves only to tell the TLB flush code whether it may switch to init_mm instead of doing a normal flush. This makes fairly extensive changes to xen_exit_mmap(), which used to look a bit like black magic. Perf is unchanged. With or without this change, perf may behave a bit erratically if it tries to read user memory in kernel thread context. We should build on this patch to teach perf to never look at user memory when cpu_tlbstate.loaded_mm != current->mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP codeAndy Lutomirski
The UP asm/tlbflush.h generates somewhat nicer code than the SMP version. Aside from that, it's fallen quite a bit behind the SMP code: - flush_tlb_mm_range() didn't flush individual pages if the range was small. - The lazy TLB code was much weaker. This usually wouldn't matter, but, if a kernel thread flushed its lazy "active_mm" more than once (due to reclaim or similar), it wouldn't be unlazied and would instead pointlessly flush repeatedly. - Tracepoints were missing. Aside from that, simply having the UP code around was a maintanence burden, since it means that any change to the TLB flush code had to make sure not to break it. Simplify everything by deleting the UP code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Refactor flush_tlb_mm_range() to merge local and remote casesAndy Lutomirski
The local flush path is very similar to the remote flush path. Merge them. This is intended to make no difference to behavior whatsoever. It removes some code and will make future changes to the flushing mechanics simpler. This patch does remove one small optimization: flush_tlb_mm_range() now has an unconditional smp_mb() instead of using MOV to CR3 or INVLPG as a full barrier when applicable. I think this is okay for a few reasons. First, smp_mb() is quite cheap compared to the cost of a TLB flush. Second, this rearrangement makes a bigger optimization available: with some work on the SMP function call code, we could do the local and remote flushes in parallel. Third, I'm planning a rework of the TLB flush algorithm that will require an atomic operation at the beginning of each flush, and that operation will replace the smp_mb(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Pass flush_tlb_info to flush_tlb_others() etcAndy Lutomirski
Rather than passing all the contents of flush_tlb_info to flush_tlb_others(), pass a pointer to the structure directly. For consistency, this also removes the unnecessary cpu parameter from uv_flush_tlb_others() to make its signature match the other *flush_tlb_others() functions. This serves two purposes: - It will dramatically simplify future patches that change struct flush_tlb_info, which I'm planning to do. - struct flush_tlb_info is an adequate description of what to do for a local flush, too, so by reusing it we can remove duplicated code between local and remove flushes in a future patch. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org [ Fix build warning. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05Merge tag 'v4.12-rc4' into x86/mm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-04KVM: improve arch vcpu request definingAndrew Jones
Marc Zyngier suggested that we define the arch specific VCPU request base, rather than requiring each arch to remember to start from 8. That suggestion, along with Radim Krcmar's recent VCPU request flag addition, snowballed into defining something of an arch VCPU request defining API. No functional change. (Looks like x86 is running out of arch VCPU request bits. Maybe someday we'll need to extend to 64.) Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-05-27take compat_sys_old_getrlimit() to native syscallAl Viro
... and sanitize the ifdefs in there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-27Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixlets for RAS: - Export memory_error() so the NFIT module can utilize it - Handle memory errors in NFIT correctly" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi, nfit: Fix the memory error check in nfit_handle_mce() x86/MCE: Export memory_error()
2017-05-25Drivers: hv: vmbus: Get the current time from the current clocksourceK. Y. Srinivasan
The current code uses the MSR based mechanism to get the current tick. Use the current clock source as that might be more optimal. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-24mm, x86/mm: Make the batched unmap TLB flush API more genericAndy Lutomirski
try_to_unmap_flush() used to open-code a rather x86-centric flush sequence: local_flush_tlb() + flush_tlb_others(). Rearrange the code so that the arch (only x86 for now) provides arch_tlbbatch_add_mm() and arch_tlbbatch_flush() and the core code calls those functions instead. I'll want this for x86 because, to enable address space ids, I can't support the flush_tlb_others() mode used by exising try_to_unmap_flush() implementation with good performance. I can support the new API fairly easily, though. I imagine that other architectures may be in a similar position. Architectures with strong remote flush primitives (arm64?) may have even worse performance problems with flush_tlb_others() the way that try_to_unmap_flush() uses it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/19f25a8581f9fb77876b7ff3b001f89835e34ea3.1495492063.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()Andy Lutomirski
flush_tlb_page() was very similar to flush_tlb_mm_range() except that it had a couple of issues: - It was missing an smp_mb() in the case where current->active_mm != mm. (This is a longstanding bug reported by Nadav Amit) - It was missing tracepoints and vm counter updates. The only reason that I can see for keeping it at as a separate function is that it could avoid a few branches that flush_tlb_mm_range() needs to decide to flush just one page. This hardly seems worthwhile. If we decide we want to get rid of those branches again, a better way would be to introduce an __flush_tlb_mm_range() helper and make both flush_tlb_page() and flush_tlb_mm_range() use it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23perf/x86: Add sysfs entry to freeze counters on SMIKan Liang
Currently, the SMIs are visible to all performance counters, because many users want to measure everything including SMIs. But in some cases, the SMI cycles should not be counted - for example, to calculate the cost of an SMI itself. So a knob is needed. When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance counters will be effected. There is no way to do per-counter freeze on SMI. So it should not use the per-event interface (e.g. ioctl or event attribute) to set FREEZE_WHILE_SMM bit. Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages while in SMM. Value has to be 0 or 1. It will be applied to all processors. Also serialize the entire setting so we don't get multiple concurrent threads trying to update to different values. Signed-off-by: Kan Liang <Kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: bp@alien8.de Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/1494600673-244667-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-21x86: fix 32-bit case of __get_user_asm_u64()Linus Torvalds
The code to fetch a 64-bit value from user space was entirely buggered, and has been since the code was merged in early 2016 in commit b2f680380ddf ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit kernels"). Happily the buggered routine is almost certainly entirely unused, since the normal way to access user space memory is just with the non-inlined "get_user()", and the inlined version didn't even historically exist. The normal "get_user()" case is handled by external hand-written asm in arch/x86/lib/getuser.S that doesn't have either of these issues. There were two independent bugs in __get_user_asm_u64(): - it still did the STAC/CLAC user space access marking, even though that is now done by the wrapper macros, see commit 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses"). This didn't result in a semantic error, it just means that the inlined optimized version was hugely less efficient than the allegedly slower standard version, since the CLAC/STAC overhead is quite high on modern Intel CPU's. - the double register %eax/%edx was marked as an output, but the %eax part of it was touched early in the asm, and could thus clobber other inputs to the asm that gcc didn't expect it to touch. In particular, that meant that the generated code could look like this: mov (%eax),%eax mov 0x4(%eax),%edx where the load of %edx obviously was _supposed_ to be from the 32-bit word that followed the source of %eax, but because %eax was overwritten by the first instruction, the source of %edx was basically random garbage. The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark the 64-bit output as early-clobber to let gcc know that no inputs should alias with the output register. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org # v4.8+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-21Clean up x86 unsafe_get/put_user() type handlingLinus Torvalds
Al noticed that unsafe_put_user() had type problems, and fixed them in commit a7cc722fff0b ("fix unsafe_put_user()"), which made me look more at those functions. It turns out that unsafe_get_user() had a type issue too: it limited the largest size of the type it could handle to "unsigned long". Which is fine with the current users, but doesn't match our existing normal get_user() semantics, which can also handle "u64" even when that does not fit in a long. While at it, also clean up the type cast in unsafe_put_user(). We actually want to just make it an assignment to the expected type of the pointer, because we actually do want warnings from types that don't convert silently. And it makes the code more readable by not having that one very long and complex line. [ This patch might become stable material if we ever end up back-porting any new users of the unsafe uaccess code, but as things stand now this doesn't matter for any current existing uses. ] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-21x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_tElena Reshetova
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: David Windsor <dwindsor@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1492695536-5947-1-git-send-email-elena.reshetova@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-21x86/MCE: Export memory_error()Borislav Petkov
Export the function which checks whether an MCE is a memory error to other users so that we can reuse the logic. Drop the boot_cpu_data use, while at it, as mce.cpuvendor already has the CPU vendor in there. Integrate a piece from a patch from Vishal Verma <vishal.l.verma@intel.com> to export it for modules (nfit). The main reason we're exporting it is that the nfit handler nfit_handle_mce() needs to detect a memory error properly before doing its recovery actions. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc uaccess fixes from Al Viro: "Fix for unsafe_put_user() (no callers currently in mainline, but anyone starting to use it will step into that) + alpha osf_wait4() infoleak fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: osf_wait4(): fix infoleak fix unsafe_put_user()
2017-05-21fix unsafe_put_user()Al Viro
__put_user_size() relies upon its first argument having the same type as what the second one points to; the only other user makes sure of that and unsafe_put_user() should do the same. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-16KVM: x86: lower default for halt_poll_nsPaolo Bonzini
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to increase heavily even in cases where the performance improvement was small. In particular, bandwidth divided by CPU usage was as much as 60% lower. To some extent this is the expected effect of the patch, and the additional CPU utilization is only visible when running the benchmarks. However, halving the threshold also halves the extra CPU utilization (from +30-130% to +20-70%) and has no negative effect on performance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-15kill strlen_user()Al Viro
no callers, no consistent semantics, no sane way to use it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-16Merge branch 'utilities' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull power management utilities updates from Len Brown. * 'utilities' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: intel_pstate: use updated msr-index.h HWP.EPP values tools/power x86_energy_perf_policy: support HWP.EPP x86: msr-index.h: fix shifts to ULL results in HWP macros. x86: msr-index.h: define HWP.EPP values x86: msr-index.h: define EPB mid-points
2017-05-15x86/tsc: Remodel cyc2ns to use seqcount_latch()Peter Zijlstra
Replace the custom multi-value scheme with the more regular seqcount_latch() scheme. Along with scrapping a lot of lines, the latch scheme is better documented and used in more places. The immediate benefit however is not being limited on the update side. The current code has a limit where the writers block which is hit by future changes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-12Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Incremental fixes and a small feature addition on top of the main libnvdimm 4.12 pull request: - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX. The size regression is fixed by moving all dax helpers into the dax-core and only specifying "select DAX" for FS_DAX and dax-capable drivers. He also asked for clarification of the NR_DEV_DAX config option which, on closer look, does not need to be a config option at all. Mike also throws in a DEV_DAX_PMEM fixup for good measure. - Ben's attention to detail on -stable patch submissions caught a case where the recent fixes to arch_copy_from_iter_pmem() missed a condition where we strand dirty data in the cache. This is tagged for -stable and will also be included in the rework of the pmem api to a proposed {memcpy,copy_user}_flushcache() interface for 4.13. - Vishal adds a feature that missed the initial pull due to pending review feedback. It allows the kernel to clear media errors when initializing a BTT (atomic sector update driver) instance on a pmem namespace. - Ross noticed that the dax_device + dax_operations conversion broke __dax_zero_page_range(). The nvdimm unit tests fail to check this path, but xfstests immediately trips over it. No excuse for missing this before submitting the 4.12 pull request. These all pass the nvdimm unit tests and an xfstests spot check. The set has received a build success notification from the kbuild robot" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: filesystem-dax: fix broken __dax_zero_page_range() conversion libnvdimm, btt: ensure that initializing metadata clears poison libnvdimm: add an atomic vs process context flag to rw_bytes x86, pmem: Fix cache flushing for iovec write < 8 bytes device-dax: kill NR_DEV_DAX block, dax: move "select DAX" from BLOCK to FS_DAX device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
2017-05-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two boot crash fixes - unwinder fixes - kexec related kernel direct mappings enhancements/fixes - more Clang support quirks - minor cleanups - Documentation fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix a typo in Documentation x86/build: Don't add -maccumulate-outgoing-args w/o compiler support x86/boot/32: Fix UP boot on Quark and possibly other platforms x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() x86/kexec/64: Use gbpages for identity mappings if available x86/mm: Add support for gbpages to kernel_ident_mapping_init() x86/boot: Declare error() as noreturn x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() x86/microcode/AMD: Remove redundant NULL check on mc
2017-05-10Merge tag 'kbuild-uapi-v4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild UAPI updates from Masahiro Yamada: "Improvement of headers_install by Nicolas Dichtel. It has been long since the introduction of uapi directories, but the de-coupling of exported headers has not been completed. Headers listed in header-y are exported whether they exist in uapi directories or not. His work fixes this inconsistency. All (and only) headers under uapi directories are now exported. The asm-generic wrappers are still exceptions, but this is a big step forward" * tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: arch/include: remove empty Kbuild files uapi: export all arch specifics directories uapi: export all headers under uapi directories smc_diag.h: fix include from userland btrfs_tree.h: fix include from userland uapi: includes linux/types.h before exporting files Makefile.headersinst: remove destination-y option Makefile.headersinst: cleanup input files x86: stop exporting msr-index.h to userland nios2: put setup.h in uapi h8300: put bitsperlong.h in uapi
2017-05-11uapi: export all headers under uapi directoriesNicolas Dichtel
Regularly, when a new header is created in include/uapi/, the developer forgets to add it in the corresponding Kbuild file. This error is usually detected after the release is out. In fact, all headers under uapi directories should be exported, thus it's useless to have an exhaustive list. After this patch, the following files, which were not exported, are now exported (with make headers_install_all): asm-arc/kvm_para.h asm-arc/ucontext.h asm-blackfin/shmparam.h asm-blackfin/ucontext.h asm-c6x/shmparam.h asm-c6x/ucontext.h asm-cris/kvm_para.h asm-h8300/shmparam.h asm-h8300/ucontext.h asm-hexagon/shmparam.h asm-m32r/kvm_para.h asm-m68k/kvm_para.h asm-m68k/shmparam.h asm-metag/kvm_para.h asm-metag/shmparam.h asm-metag/ucontext.h asm-mips/hwcap.h asm-mips/reg.h asm-mips/ucontext.h asm-nios2/kvm_para.h asm-nios2/ucontext.h asm-openrisc/shmparam.h asm-parisc/kvm_para.h asm-powerpc/perf_regs.h asm-sh/kvm_para.h asm-sh/ucontext.h asm-tile/shmparam.h asm-unicore32/shmparam.h asm-unicore32/ucontext.h asm-x86/hwcap2.h asm-xtensa/kvm_para.h drm/armada_drm.h drm/etnaviv_drm.h drm/vgem_drm.h linux/aspeed-lpc-ctrl.h linux/auto_dev-ioctl.h linux/bcache.h linux/btrfs_tree.h linux/can/vxcan.h linux/cifs/cifs_mount.h linux/coresight-stm.h linux/cryptouser.h linux/fsmap.h linux/genwqe/genwqe_card.h linux/hash_info.h linux/kcm.h linux/kcov.h linux/kfd_ioctl.h linux/lightnvm.h linux/module.h linux/nbd-netlink.h linux/nilfs2_api.h linux/nilfs2_ondisk.h linux/nsfs.h linux/pr.h linux/qrtr.h linux/rpmsg.h linux/sched/types.h linux/sed-opal.h linux/smc.h linux/smc_diag.h linux/stm.h linux/switchtec_ioctl.h linux/vfio_ccw.h linux/wil6210_uapi.h rdma/bnxt_re-abi.h Note that I have removed from this list the files which are generated in every exported directories (like .install or .install.cmd). Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all subdirs with a pure makefile command. For the record, note that exported files for asm directories are a mix of files listed by: - include/uapi/asm-generic/Kbuild.asm; - arch/<arch>/include/uapi/asm/Kbuild; - arch/<arch>/include/asm/Kbuild. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Mark Salter <msalter@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-05-11x86: stop exporting msr-index.h to userlandNicolas Dichtel
Even if this file was not in an uapi directory, it was exported because it was listed in the Kbuild file. Fixes: b72e7464e4cf ("x86/uapi: Do not export <asm/msr-index.h> as part of the user API headers") Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-05-09x86, pmem: Fix cache flushing for iovec write < 8 bytesBen Hutchings
Commit 11e63f6d920d added cache flushing for unaligned writes from an iovec, covering the first and last cache line of a >= 8 byte write and the first cache line of a < 8 byte write. But an unaligned write of 2-7 bytes can still cover two cache lines, so make sure we flush both in that case. Cc: <stable@vger.kernel.org> Fixes: 11e63f6d920d ("x86, pmem: fix broken __copy_user_nocache ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-09kvm: x86: Add a hook for arch specific dirty logging emulationBandan Das
When KVM updates accessed/dirty bits, this hook can be used to invoke an arch specific function that implements/emulates dirty logging such as PML. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-08Merge tag 'pci-v4.12-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...
2017-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
2017-05-08treewide: decouple cacheflush.h and set_memory.hLaura Abbott
Now that all call sites, completely decouple cacheflush.h and set_memory.h [sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling] Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08treewide: move set_memory_* functions away from cacheflush.hLaura Abbott
Patch series "set_memory_* functions header refactor", v3. The set_memory_* APIs came out of a desire to have a better way to change memory attributes. Many of these attributes were linked to cache functionality so the prototypes were put in cacheflush.h. These days, the APIs have grown and have a much wider use than just cache APIs. To support this growth, split off set_memory_* and friends into a separate header file to avoid growing cacheflush.h for APIs that have nothing to do with caches. Link: http://lkml.kernel.org/r/1488920133-27229-2-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...
2017-05-08x86/mm: Add support for gbpages to kernel_ident_mapping_init()Xunlei Pang
Kernel identity mappings on x86-64 kernels are created in two ways: by the early x86 boot code, or by kernel_ident_mapping_init(). Native kernels (which is the dominant usecase) use the former, but the kexec and the hibernation code uses kernel_ident_mapping_init(). There's a subtle difference between these two ways of how identity mappings are created, the current kernel_ident_mapping_init() code creates identity mappings always using 2MB page(PMD level) - while the native kernel boot path also utilizes gbpages where available. This difference is suboptimal both for performance and for memory usage: kernel_ident_mapping_init() needs to allocate pages for the page tables when creating the new identity mappings. This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init() to address these concerns. The primary advantage would be better TLB coverage/performance, because we'd utilize 1GB TLBs instead of 2MB ones. It is also useful for machines with large number of memory to save paging structure allocations(around 4MB/TB using 2MB page) when setting identity mappings for all the memory, after using 1GB page it will consume only 8KB/TB. ( Note that this change alone does not activate gbpages in kexec, we are doing that in a separate patch. ) Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-05Merge tag 'libnvdimm-for-4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has been in multiple -next releases. There were a few late breaking fixes and small features that got added in the last couple days, but the whole set has received a build success notification from the kbuild robot. Change summary: - Region media error reporting: A libnvdimm region device is the parent to one or more namespaces. To date, media errors have been reported via the "badblocks" attribute attached to pmem block devices for namespaces in "raw" or "memory" mode. Given that namespaces can be in "device-dax" or "btt-sector" mode this new interface reports media errors generically, i.e. independent of namespace modes or state. This subsequently allows userspace tooling to craft "ACPI 6.1 Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error" requests and submit them via the ioctl path for NVDIMM root bus devices. - Introduce 'struct dax_device' and 'struct dax_operations': Prompted by a request from Linus and feedback from Christoph this allows for dax capable drivers to publish their own custom dax operations. This fixes the broken assumption that all dax operations are related to a persistent memory device, and makes it easier for other architectures and platforms to add customized persistent memory support. - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is available for storage appliance applications to manually trigger memory controllers to drain write-pending buffers that would otherwise be flushed automatically by the platform ADR (asynchronous-DRAM-refresh) mechanism at a power loss event. Support for "locked" DIMMs is included to prevent namespaces from surfacing when the namespace label data area is locked. Finally, fixes for various reported deadlocks and crashes, also tagged for -stable. - ACPI / nfit driver updates: General updates of the nfit driver to add DSM command overrides, ACPI 6.1 health state flags support, DSM payload debug available by default, and various fixes. Acknowledgements that came after the branch was pushed: - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock": Tested-by: Yi Zhang <yizhan@redhat.com> - commit 23f498448362 "libnvdimm: rework region badblocks clearing" Tested-by: Toshi Kani <toshi.kani@hpe.com>" * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits) libnvdimm, pfn: fix 'npfns' vs section alignment libnvdimm: handle locked label storage areas libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED brd: fix uninitialized use of brd->dax_dev block, dax: use correct format string in bdev_dax_supported device-dax: fix sysfs attribute deadlock libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking" libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering libnvdimm: rework region badblocks clearing acpi, nfit: kill ACPI_NFIT_DEBUG libnvdimm: fix clear length of nvdimm_forget_poison() libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify libnvdimm, region: sysfs trigger for nvdimm_flush() libnvdimm: fix phys_addr for nvdimm_clear_poison x86, dax, pmem: remove indirection around memcpy_from_pmem() block: remove block_device_operations ->direct_access() block, dax: convert bdev_dax_supported() to dax_direct_access() filesystem-dax: convert to dax_direct_access() Revert "block: use DAX for partition table reads" ext2, ext4, xfs: retrieve dax_device for iomap operations ...