summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2015-09-21KVM: x86: trap AMD MSRs for the TSeg base and maskPaolo Bonzini
These have roughly the same purpose as the SMRR, which we do not need to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at boot, which causes problems when running a Xen dom0 under KVM. Just return 0, meaning that processor protection of SMRAM is not in effect. Reported-by: M A Young <m.a.young@durham.ac.uk> Cc: stable@vger.kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Mostly stable material, a lot of ARM fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) sched: access local runqueue directly in single_task_running arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' arm64: KVM: Remove all traces of the ThumbEE registers arm: KVM: Disable virtual timer even if the guest is not using it arm64: KVM: Disable virtual timer even if the guest is not using it arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources KVM: s390: Replace incorrect atomic_or with atomic_andnot arm: KVM: Fix incorrect device to IPA mapping arm64: KVM: Fix user access for debug registers KVM: vmx: fix VPID is 0000H in non-root operation KVM: add halt_attempted_poll to VCPU stats kvm: fix zero length mmio searching kvm: fix double free for fast mmio eventfd kvm: factor out core eventfd assign/deassign logic kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd KVM: make the declaration of functions within 80 characters KVM: arm64: add workaround for Cortex-A57 erratum #852523 KVM: fix polling for guest halt continued even if disable it arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores arm64: KVM: set {v,}TCR_EL2 RES1 bits ...
2015-09-18perf/x86/intel/pebs: Add PEBS frontend profiling for SkylakeAndi Kleen
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile frontend problems (like ITLB or decoding issues). The new event is configured through a separate MSR, which selects a range of sub events. Define the extra MSR as a extra reg and export support for it through sysfs. To avoid duplicating the existing tables use a new function to add new entries to existing tables. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-17Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - misc fixes all around the map - block non-root vm86(old) if mmap_min_addr != 0 - two small debuggability improvements - removal of obsolete paravirt op * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform: Fix Geode LX timekeeping in the generic x86 build x86/apic: Serialize LVTT and TSC_DEADLINE writes x86/ioapic: Force affinity setting in setup_ioapic_dest() x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method x86/ldt: Fix small LDT allocation for Xen x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text x86/cpu: Print family/model/stepping in hex x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0 x86/alternatives: Make optimize_nops() interrupt safe and synced x86/mm/srat: Print non-volatile flag in SRAT x86/cpufeatures: Enable cpuid for Intel SHA extensions
2015-09-17Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Spinlock performance regression fix, plus documentation fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/static_keys: Fix up the static keys documentation locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support locking/qspinlock/x86: Fix performance regression under unaccelerated VMs locking/static_keys: Fix a silly typo
2015-09-16KVM: add halt_attempted_poll to VCPU statsPaolo Bonzini
This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-14x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz methodJuergen Gross
It's not used anywhere. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: chrisw@sous-sol.org Cc: jeremy@goop.org Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1442227343-403-1-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Add C structures for AVX-512 state componentsDave Hansen
AVX-512 has 3 separate state components: 1. opmask registers 2. zmm upper half of registers 0-15 3. new zmm registers (16-31) This patch adds C structures for the three components along with a few comments mostly lifted from the SDM to explain what they do. This will allow us to check our structures against what the hardware tells us about the sizes of the components. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.F2433B98@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Rework YMM definitionDave Hansen
We are about to rework all of the "extended state" definitions. This makes the 'ymm' naming consistent with the AVX-512 types we will introduce later. We also add a convenience type: "reg_128_bit" so that we do not have to spell out our arithmetic. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.B4EB045F@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu/mpx: Rework MPX 'xstate' typesDave Hansen
MPX includes two separate "extended state components". There is no real need to have an 'mpx_struct' because we never really manage the states together. We also separate out the actual data in 'mpx_bndcsr_state' from the padding. We will shortly be checking the state sizes against our structures and need them to match. For consistency, we also ensure to prefix these types with 'mpx_'. Lastly, we add some comments to mirror some of the descriptions in the Intel documents (SDM) of the various state components. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.384B73EB@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Rework XSTATE_* macros to remove magic '2'Dave Hansen
The 'xstate.c' code has a bunch of references to '2'. This is because we have a lot more work to do for the "extended" xstates than the "legacy" ones and state component 2 is the first "extended" state. This patch replaces all of the instances of '2' with FIRST_EXTENDED_XFEATURE, which clearly explains what is going on. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233128.A8C0BF51@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Rename XFEATURES_NR_MAXDave Hansen
This is a logcal followon to the last patch. It makes the XFEATURE_MAX naming consistent with the other enum values. This is what Ingo suggested. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233127.A541448F@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Rename XSAVE macrosDave Hansen
There are two concepts that have some confusing naming: 1. Extended State Component numbers (currently called XFEATURE_BIT_*) 2. Extended State Component masks (currently called XSTATE_*) The numbers are (currently) from 0-9. State component 3 is the bounds registers for MPX, for instance. But when we want to enable "state component 3", we go set a bit in XCR0. The bit we set is 1<<3. We can check to see if a state component feature is enabled by looking at its bit. The current 'xfeature_bit's are at best xfeature bit _numbers_. Calling them bits is at best inconsistent with ending the enum list with 'XFEATURES_NR_MAX'. This patch renames the enum to be 'xfeature'. These also happen to be what the Intel documentation calls a "state component". We also want to differentiate these from the "XSTATE_*" macros. The "XSTATE_*" macros are a mask, and we rename them to match. These macros are reasonably widely used so this patch is a wee bit big, but this really is just a rename. The only non-mechanical part of this is the s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/ We need a better name for it, but that's another patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com [ Ported to v4.3-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Remove partial LWP support definitionsDave Hansen
LightWeight Profiling was evidently an AMD profiling feature that we never got around to implementing. Remove the references to it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233125.7E602284@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Remove XSTATE_RESERVEDave Hansen
The original purpose of XSTATE_RESERVE was to carve out space to store all of the possible extended state components that get saved with the XSAVE instruction(s). However, we are now almost entirely dynamically allocating the buffers we use for XSAVE by placing them at the end of the task_struct and them sizing them at boot. The one exception for that is the init_task. The maximum extended state component size that we have today is on systems with space for AVX-512 and Memory Protection Keys: 2696 bytes. We have reserved a PAGE_SIZE buffer in the init_task via fpregs_state->__padding. This check ensures that even if the component sizes or layout were changed (which we do not expect), that we will still not overflow the init_task's buffer. In the case that we detect we might overflow the buffer, we completely disable XSAVE support in the kernel and try to boot as if we had 'legacy x87 FPU' support in place. This is a crippled state without any of the XSAVE-enabled features (MPX, AVX, etc...). But, it at least let us boot safely. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233125.D948D475@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/fpu: Move XSAVE-disabling code to a helperDave Hansen
When we want to _completely_ disable XSAVE support as far as the kernel is concerned, we have a big set of feature flags to clear. We currently only do this in cases where the user asks for it to be disabled, but we are about to expand the places where we do it to handle errors too. Move the code in to xstate.c, and put it in the xstate.h header. We will use it in the next patch too. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233124.EA9A70E5@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14x86/headers: Clean up too long linesPeter Zijlstra
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: brgerst@gmail.com Cc: dvlasenk@redhat.com Cc: luto@amacapital.net Cc: mikko.rapeli@iki.fi Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/20150909071244.GM3644@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13x86/platform/uv: Insert per_cpu accessor function on uv_hub_nmiGeorge Beshers
UV: NMI: insert this_cpu_read accessor function on uv_hub_nmi. On SGI UV systems a 'power nmi' command from the CMC causes all processors to drop into uv_handle_nmi(). With the 4.0 kernel this results in BUG: unable to handle kernel paging request The bug is caused by the current code trying to use the PER_CPU variable uv_cpu_nmi.hub without an appropriate accessor function. That oversight occurred in commit e16321709c82 ("uv: Replace __get_cpu_var") Author: Christoph Lameter <cl@linux.com> Date: Sun Aug 17 12:30:41 2014 -0500 This patch inserts this_cpu_read() in the uv_hub_nmi macro restoring the intended functionality. Signed-off-by: George Beshers <gbeshers@sgi.com> Acked-by: Mike Travis <travis@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <rja@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-11locking/qspinlock/x86: Only emit the test-and-set fallback when building ↵Peter Zijlstra
guest support Only emit the test-and-set fallback for Hypervisors lacking PARAVIRT_SPINLOCKS support when building for guests. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 4.2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-11locking/qspinlock/x86: Fix performance regression under unaccelerated VMsPeter Zijlstra
Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS set and Linus noted that the test-and-set implementation was retarded. One should spin on the variable with a load, not a RMW. While there, remove 'queued' from the name, as the lock isn't queued at all, but a simple test-and-set. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Cc: stable@vger.kernel.org # v4.2+ Link: http://lkml.kernel.org/r/20150904152523.GR18673@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge third patch-bomb from Andrew Morton: - even more of the rest of MM - lib/ updates - checkpatch updates - small changes to a few scruffy filesystems - kmod fixes/cleanups - kexec updates - a dma-mapping cleanup series from hch * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) dma-mapping: consolidate dma_set_mask dma-mapping: consolidate dma_supported dma-mapping: cosolidate dma_mapping_error dma-mapping: consolidate dma_{alloc,free}_noncoherent dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() mm: make sure all file VMAs have ->vm_ops set mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() mm: mark most vm_operations_struct const namei: fix warning while make xmldocs caused by namei.c ipc: convert invalid scenarios to use WARN_ON zlib_deflate/deftree: remove bi_reverse() lib/decompress_unlzma: Do a NULL check for pointer lib/decompressors: use real out buf size for gunzip with kernel fs/affs: make root lookup from blkdev logical size sysctl: fix int -> unsigned long assignments in INT_MIN case kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo kexec: align crash_notes allocation to make it be inside one physical page kexec: remove unnecessary test in kimage_alloc_crash_control_pages() kexec: split kexec_load syscall from kexec core code ...
2015-09-10Merge tag 'for-linus-4.3-rc0b-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen terminology fixes from David Vrabel: "Use the correct GFN/BFN terms more consistently" * tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn xen/privcmd: Further s/MFN/GFN/ clean-up hvc/xen: Further s/MFN/GFN clean-up video/xen-fbfront: Further s/MFN/GFN clean-up xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn xen: Use correctly the Xen memory terminologies arm/xen: implement correctly pfn_to_mfn xen: Make clear that swiotlb and biomerge are dealing with DMA address
2015-09-10dma-mapping: consolidate dma_set_maskChristoph Hellwig
Almost everyone implements dma_set_mask the same way, although some time that's hidden in ->set_dma_mask methods. This patch consolidates those into a common implementation that either calls ->set_dma_mask if present or otherwise uses the default implementation. Some architectures used to only call ->set_dma_mask after the initial checks, and those instance have been fixed to do the full work. h8300 implemented dma_set_mask bogusly as a no-ops and has been fixed. Unfortunately some architectures overload unrelated semantics like changing the dma_ops into it so we still need to allow for an architecture override for now. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10dma-mapping: consolidate dma_supportedChristoph Hellwig
Most architectures just call into ->dma_supported, but some also return 1 if the method is not present, or 0 if no dma ops are present (although that should never happeb). Consolidate this more broad version into common code. Also fix h8300 which inorrectly always returned 0, which would have been a problem if it's dma_set_mask implementation wasn't a similarly buggy noop. As a few architectures have much more elaborate implementations, we still allow for arch overrides. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10dma-mapping: cosolidate dma_mapping_errorChristoph Hellwig
Currently there are three valid implementations of dma_mapping_error: (1) call ->mapping_error (2) check for a hardcoded error code (3) always return 0 This patch provides a common implementation that calls ->mapping_error if present, then checks for DMA_ERROR_CODE if defined or otherwise returns 0. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10dma-mapping: consolidate dma_{alloc,free}_noncoherentChristoph Hellwig
Most architectures do not support non-coherent allocations and either define dma_{alloc,free}_noncoherent to their coherent versions or stub them out. Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips implements them directly. This patch moves the Openrisc version to common code, and handles the DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance. Note that actual non-coherent allocations require a dma_cache_sync implementation, so if non-coherent allocations didn't work on an architecture before this patch they still won't work after it. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}Christoph Hellwig
Since 2009 we have a nice asm-generic header implementing lots of DMA API functions for architectures using struct dma_map_ops, but unfortunately it's still missing a lot of APIs that all architectures still have to duplicate. This series consolidates the remaining functions, although we still need arch opt outs for two of them as a few architectures have very non-standard implementations. This patch (of 5): The coherent DMA allocator works the same over all architectures supporting dma_map operations. This patch consolidates them and converges the minor differences: - the debug_dma helpers are now called from all architectures, including those that were previously missing them - dma_alloc_from_coherent and dma_release_from_coherent are now always called from the generic alloc/free routines instead of the ops dma-mapping-common.h always includes dma-coherent.h to get the defintions for them, or the stubs if the architecture doesn't support this feature - checks for ->alloc / ->free presence are removed. There is only one magic instead of dma_map_ops without them (mic_dma_ops) and that one is x86 only anyway. Besides that only x86 needs special treatment to replace a default devices if none is passed and tweak the gfp_flags. An optional arch hook is provided for that. [linux@roeck-us.net: fix build] [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10kexec: split kexec_load syscall from kexec core codeDave Young
There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08Merge tag 'libnvdimm-for-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). Summary: - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. - Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. - Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. - Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes" * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits) libnvdimm, pmem: direct map legacy pmem by default libnvdimm, pmem: 'struct page' for pmem libnvdimm, pfn: 'struct page' provider infrastructure x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB add devm_memremap_pages mm: ZONE_DEVICE for "device memory" mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h dax: drop size parameter to ->direct_access() nd_blk: change aperture mapping from WC to WB nvdimm: change to use generic kvfree() pmem, dax: have direct_access use __pmem annotation dax: update I/O path to do proper PMEM flushing pmem: add copy_from_iter_pmem() and clear_pmem() pmem, x86: clean up conditional pmem includes pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: move x86 PMEM API to new pmem.h header libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option pmem: switch to devm_ allocations devres: add devm_memremap libnvdimm, btt: write and validate parent_uuid ...
2015-09-08Merge tag 'trace-v4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "Mostly this is just clean ups and micro optimizations. The changes with more meat are: - Allowing the trace event filters to filter on CPU number and process ids - Two new markers for trace output latency were added (10 and 100 msec latencies) - Have tracing_thresh filter function profiling time I also worked on modifying the ring buffer code for some future work, and moved the adding of the timestamp around. One of my changes caused a regression, and since other changes were built on top of it and already tested, I had to operate a revert of that change. Instead of rebasing, this change set has the code that caused a regression as well as the code to revert that change without touching the other changes that were made on top of it" * tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated" tracing: Don't make assumptions about length of string on task rename tracing: Allow triggers to filter for CPU ids and process names ftrace: Format MCOUNT_ADDR address as type unsigned long tracing: Introduce two additional marks for delay ftrace: Fix function_graph duration spacing with 7-digits ftrace: add tracing_thresh to function profile tracing: Clean up stack tracing and fix fentry updates ring-buffer: Reorganize function locations ring-buffer: Make sure event has enough room for extend and padding ring-buffer: Get timestamp after event is allocated ring-buffer: Move the adding of the extended timestamp out of line ring-buffer: Add event descriptor to simplify passing data ftrace: correct the counter increment for trace_buffer data tracing: Fix for non-continuous cpu ids tracing: Prefer kcalloc over kzalloc with multiply
2015-09-08Merge tag 'for-linus-4.3-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Xen features and fixes for 4.3: - Convert xen-blkfront to the multiqueue API - [arm] Support binding event channels to different VCPUs. - [x86] Support > 512 GiB in a PV guests (off by default as such a guest cannot be migrated with the current toolstack). - [x86] PMU support for PV dom0 (limited support for using perf with Xen and other guests)" * tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits) xen: switch extra memory accounting to use pfns xen: limit memory to architectural maximum xen: avoid another early crash of memory limited dom0 xen: avoid early crash of memory limited dom0 arm/xen: Remove helpers which are PV specific xen/x86: Don't try to set PCE bit in CR4 xen/PMU: PMU emulation code xen/PMU: Intercept PMU-related MSR and APIC accesses xen/PMU: Describe vendor-specific PMU registers xen/PMU: Initialization code for Xen PMU xen/PMU: Sysfs interface for setting Xen PMU mode xen: xensyms support xen: remove no longer needed p2m.h xen: allow more than 512 GB of RAM for 64 bit pv-domains xen: move p2m list if conflicting with e820 map xen: add explicit memblock_reserve() calls for special pages mm: provide early_memremap_ro to establish read-only mapping xen: check for initrd conflicting with e820 map xen: check pre-allocated page tables for conflict with memory map xen: check for kernel memory conflicting with memory layout ...
2015-09-08xen: Use correctly the Xen memory terminologiesJulien Grall
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN is meant, I suspect this is because the first support for Xen was for PV. This resulted in some misimplementation of helpers on ARM and confused developers about the expected behavior. For instance, with pfn_to_mfn, we expect to get an MFN based on the name. Although, if we look at the implementation on x86, it's returning a GFN. For clarity and avoid new confusion, replace any reference to mfn with gfn in any helpers used by PV drivers. The x86 code will still keep some reference of pfn_to_mfn which may be used by all kind of guests No changes as been made in the hypercall field, even though they may be invalid, in order to keep the same as the defintion in xen repo. Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a name to close to the KVM function gfn_to_page. Take also the opportunity to simplify simple construction such as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up will come in follow-up patches. [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-08xen: Make clear that swiotlb and biomerge are dealing with DMA addressJulien Grall
The swiotlb is required when programming a DMA address on ARM when a device is not protected by an IOMMU. In this case, the DMA address should always be equal to the machine address. For DOM0 memory, Xen ensure it by have an identity mapping between the guest address and host address. However, when mapping a foreign grant reference, the 1:1 model doesn't work. For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN (Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point of view given that all ARM guest are auto-translated. Even though the name pfn_to_mfn is misleading, we need to ensure that those caller get a GFN and not by mistake a MFN. In pratical, I haven't seen error related to this but we should fix it for the sake of correctness. In order to fix the implementation of pfn_to_mfn on ARM in a follow-up patch, we have to introduce new helpers to return the DMA from a PFN and the invert. On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn. The helpers will be used in swiotlb and xen_biovec_phys_mergeable. This is necessary in the latter because we have to ensure that the biovec code will not try to merge a biovec using foreign page and another using Linux memory. Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn given that the only usage was in swiotlb. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-08x86/headers: Remove <asm/sigcontext.h> references on the kernel sideIngo Molnar
Now that all type definitions are in the UAPI header, include it directly, instead of through <asm/sigcontext.h>. [ We still keep asm/sigcontext.h, so that uapi/asm/sigcontext32.h can include <asm/sigcontext.h>. ] Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-16-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Remove direct sigcontext32.h usesIngo Molnar
Now that all sigcontext types are defined in asm/sigcontext.h, remove the various sigcontext32.h uses in the kernel. We still keep the header itself, which includes sigcontext.h, in case user-space relies on it. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-15-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Convert sigcontext_ia32 uses to sigcontext_32Ingo Molnar
Use the new name in kernel code, and move the old name to the user-space-only legacy section of the UAPI header. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-14-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'Ingo Molnar
The two structures are identical - merge them and keep the legacy name as a define. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-13-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Make sigcontext pointers bit independentIngo Molnar
Before we can eliminate the duplication between 'struct sigcontext_32' and 'struct sigcontext_ia32', make the 'fpstate' pointer field in 'struct sigcontext_32' bit independent. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-12-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Move the 'struct sigcontext' definitions into the UAPI headerIngo Molnar
Our goal is to eliminate the duplicate struct sigcontext_ia32 definition, so move the kernel's primary sigcontext type into the UAPI header, defining these two variants: struct sigcontext_32 struct sigcontext_64 ... and map them to 'struct sigcontext'. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-11-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Clean up the kernel's struct sigcontext types to be ABI-cleanIngo Molnar
Use the __u16/32/64 types we standardized on in ABI definitions and which other sigcontext related types are already using. This will help unify struct sigcontext types between native 32-bit, compat and 64-bit kernels. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-10-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32Ingo Molnar
Remove uses of _fpstate_ia32 from the kernel, and move the legacy _fpstate_ia32 definition to the user-space only portion of the header. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstateIngo Molnar
'struct _fpstate_ia32' and 'struct _fpstate' on i386 are identical in all fields, except 'padding1' being named 'padding'. We unify the two structures and add a union that is both named 'padding1' and 'padding', in the (unlikely) case there's user-space code that relies on the padding field name. We rename the two main types to be: struct _fpstate_32 struct _fpstate_64 for the 32-bit and 64-bit frame, and map them to the main and compat structure names (_fpstate) depending on whether we are on 32-bit or on 64-bit kernels. We also keep the old _fpstate_ia32 name as a legacy name. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-8-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Unify register type definitions between 32-bit compat and i386Ingo Molnar
The following sigcontext related types were duplicated across native 32-bit and compat 32-bit headers: struct _fpreg; struct _fpxreg; struct _xmmreg; X86_FXSR_MAGIC Unify them. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-7-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Use ABI types consistently in sigcontext*.hIngo Molnar
Use the __u16/32/64 types we standardized on in ABI definitions - and which most of this header was already using. This will allow us to more obviously unify the compat header into the main header. No change in functionality. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-6-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Separate out legacy user-space structure definitionsIngo Molnar
Better separate the user-space struct sigcontext definitions from the kernel definitions, so that we can unify the kernel definitions with sigcontext32.h. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-5-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Clean up and better document uapi/asm/sigcontext.hIngo Molnar
Clean up sigcontext.h: - the explanations were full of typos and were hard to read in general - use consistent and readable vertical spacing - fix, harmonize and extend comments No field name has been changed, user-space might be relying on them. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-4-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Clean up uapi/asm/sigcontext32.hIngo Molnar
Clean up sigcontext32.h a bit: - use consistent and readable vertical spacing - fix, harmonize and extend comments No field name has been changed, user-space might be relying on them. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-3-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.hIngo Molnar
Mikko Rapeli reported that the following standalone user-space header does not compile: #include <asm/sigcontext32.h> Due to undefined 'struct __fpx_sw_bytes' which is defined in asm/sigcontext.h. The following header order works: #include <asm/sigcontext.h> #include <asm/sigcontext32.h> and that's probably how everyone's been using these headers for the past decade or so, but it's a legit header file dependency bug, so include asm/sigcontext.h in sigcontext32.h to allow it to be built standlone. Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-2-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-05Merge branch 'linus' into x86/urgent, to be able to merge a dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-04mm: send one IPI per CPU to TLB flush all entries after unmapping pagesMel Gorman
An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>