linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-08-02	Merge tag 'please-pull-fix-mce-regression' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull MCE fix from Tony Luck: "Fix a regression in mce-severity.c" * tag 'please-pull-fix-mce-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix mce regression from recent cleanup
2013-08-01	sched/x86: Optimize switch_mm() for multi-threaded workloads	Rik van Riel
	Dick Fowles, Don Zickus and Joe Mario have been working on improvements to perf, and noticed heavy cache line contention on the mm_cpumask, running linpack on a 60 core / 120 thread system. The cause turned out to be unnecessary atomic accesses to the mm_cpumask. When in lazy TLB mode, the CPU is only removed from the mm_cpumask if there is a TLB flush event. Most of the time, no such TLB flush happens, and the kernel skips the TLB reload. It can also skip the atomic memory set & test. Here is a summary of Joe's test results: * The __schedule function dropped from 24% of all program cycles down to 5.5%. * The cacheline contention/hotness for accesses to that bitmask went from being the 1st/2nd hottest - down to the 84th hottest (0.3% of all shared misses which is now quite cold) * The average load latency for the bit-test-n-set instruction in __schedule dropped from 10k-15k cycles down to an average of 600 cycles. * The linpack program results improved from 133 GFlops to 144 GFlops. Peak GFlops rose from 133 to 153. Reported-by: Don Zickus <dzickus@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Tested-by: Joe Mario <jmario@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com [ Made the comments consistent around the modified code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-31	arch/x86/platform/ce4100/ce4100.c: include reboot.h	Andrew Morton
	Fix the build: arch/x86/platform/ce4100/ce4100.c: In function 'x86_ce4100_early_setup': arch/x86/platform/ce4100/ce4100.c:165:2: error: 'reboot_type' undeclared (first use in this function) Reported-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31	x86, amd, microcode: Fix error path in apply_microcode_amd()	Torsten Kaiser
	Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com> Link: http://lkml.kernel.org/r/20130723225823.2e4e7588@googlemail.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-31	x86 / tboot / ACPI: Fail extended mode reduced hardware sleep	Ben Guthro
	Register for the extended sleep callback from ACPI. As tboot currently does not support the reduced hardware sleep interface, fail this extended sleep call. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Cc: tboot-devel@lists.sourceforge.net Cc: Gang Wei <gang.wei@intel.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-30	Merge tag 'efi-urgent' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fix from Matt Fleming: * The size of memory that gets freed by free_pages() needs to be specified in pages, not bytes - by Roy Franz. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-29	x86 / cpu topology: remove the stale macro arch_provides_topology_pointers	Hanjun Guo
	Macro arch_provides_topology_pointers is pointless now, remove it. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-29	x86/mce: Fix mce regression from recent cleanup	Tony Luck
	In commit 33d7885b594e169256daef652e8d3527b2298e75 x86/mce: Update MCE severity condition check We simplified the rules to recognise each classification of recoverable machine check combining the instruction and data fetch rules into a single entry based on clarifications in the June 2013 SDM that all recoverable events would be reported on the unaffected processor with MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1. Unfortunately the simplified rule has a couple of bugs. Fix them here. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-29	nVMX: reset rflags register cache during nested vmentry.	Gleb Natapov
	During nested vmentry into vm86 mode a vcpu state is found to be incorrect because rflags does not have VM flag set since it is read from the cache and has L1's value instead of L2's. If emulate_invalid_guest_state=1 L0 KVM tries to emulate it, but emulation does not work for nVMX and it never should happen anyway. Fix that by using vmx_set_rflags() to set rflags during nested vmentry which takes care of updating register cache. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-29	KVM: x86: handle singlestep during emulation	Paolo Bonzini
	This lets debugging work better during emulation of invalid guest state. This time the check is done after emulation, but before writeback of the flags; we need to check the flags before execution of the instruction, we cannot check singlestep_rip because the CS base may have already been modified. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Conflicts: arch/x86/kvm/x86.c
2013-07-29	KVM: x86: handle hardware breakpoints during emulation	Paolo Bonzini
	This lets debugging work better during emulation of invalid guest state. The check is done before emulating the instruction, and (in the case of guest debugging) reuses EMULATE_DO_MMIO to exit with KVM_EXIT_DEBUG. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-29	KVM: x86: rename EMULATE_DO_MMIO	Paolo Bonzini
	The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-26	x86/acpi: Correct out-of-date comment of __acpi_map_table()	Zhang Yanfei
	The implementation of function __acpi_map_table() has been changed long time ago, and now it directly invokes early_ioremap() to setup the temporarily acpi table mappings. So correct its out-of-date comment. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: len.brown@intel.com Cc: pavel@ucw.cz Cc: rjw@sisk.pl Link: http://lkml.kernel.org/r/51EE7F1C.9020506@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-26	x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero	ethan.zhao
	We can check for addr being zero earlier and thus avoid the mutex_unlock() cleanup path. [bhelgaas: drop warning printk] Signed-off-by: ethan.zhao <ethan.zhao@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org>
2013-07-26	x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz	H.J. Lu
	GCC will optimize mxcsr_feature_mask_init in arch/x86/kernel/i387.c: memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct)); asm volatile("fxsave %0" : : "m" (fx_scratch)); mask = fx_scratch.mxcsr_mask; if (mask == 0) mask = 0x0000ffbf; to memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct)); asm volatile("fxsave %0" : : "m" (fx_scratch)); mask = 0x0000ffbf; since asm statement doesn’t say it will update fx_scratch. As the result, the DAZ bit will be cleared. This patch fixes it. This bug dates back to at least kernel 2.6.12. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
2013-07-26	x86, efi: correct call to free_pages	Roy Franz
	Specify memory size in pages, not bytes. Signed-off-by: Roy Franz <roy.franz@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-07-26	cpufreq: Remove unused APERF/MPERF support	Stratos Karafotis
	The target frequency calculation method in the ondemand governor has changed and it is now independent of the measured average frequency. Consequently, the APERF/MPERF support in cpufreq is not used any more, so drop it. [rjw: Changelog] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-25	x86/pci/mrst: Cleanup checkpatch.pl warnings	Valentina Manea
	This patch fixes warning and errors found by checkpatch.pl: * replace asm/acpi.h, asm/io.h and asm/smp.h with linux/acpi.h, linux/io.h and linux/smp.h respectively * remove explicit initialization to 0 of a static global variable * replace printk(KERN_INFO ...) with pr_info * use tabs instead of spaces for indentation * arrange comments so that they adhere to Documentation/CodingStyle [bhelgaas: capitalize "PCI", "Langwell", "Lincroft" consistently] Signed-off-by: Valentina Manea <valentina.manea.m@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2013-07-25	KVM: x86: Drop some unused functions from lapic	Jan Kiszka
	Both have no users anymore. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-25	KVM: x86: Simplify __apic_accept_irq	Jan Kiszka
	If posted interrupts are enabled, we can no longer track if an IRQ was coalesced based on IRR. So drop this logic also from the classic software path and simplify apic_test_and_set_irr to apic_set_irr. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds
	Pull crypto fixes from Herbert Xu: "This push fixes a memory corruption issue in caam, as well as reverting the new optimised crct10dif implementation as it breaks boot on initrd systems. Hopefully crct10dif will be reinstated once the supporting code is added so that it doesn't break boot" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" crypto: caam - Fixed the memory out of bound overwrite issue
2013-07-24	of: Specify initrd location using 64-bit	Santosh Shilimkar
	On some PAE architectures, the entire range of physical memory could reside outside the 32-bit limit. These systems need the ability to specify the initrd location using 64-bit numbers. This patch globally modifies the early_init_dt_setup_initrd_arch() function to use 64-bit numbers instead of the current unsigned long. There has been quite a bit of debate about whether to use u64 or phys_addr_t. It was concluded to stick to u64 to be consistent with rest of the device tree code. As summarized by Geert, "The address to load the initrd is decided by the bootloader/user and set at that point later in time. The dtb should not be tied to the kernel you are booting" More details on the discussion can be found here: https://lkml.org/lkml/2013/6/20/690 https://lkml.org/lkml/2012/9/13/544 Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-07-24	Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto ↵	Herbert Xu
	transform framework" This reverts commits 67822649d7305caf3dd50ed46c27b99c94eff996 39761214eefc6b070f29402aa1165f24d789b3f7 0b95a7f85718adcbba36407ef88bba0a7379ed03 31d939625a9a20b1badd2d4e6bf6fd39fa523405 2d31e518a42828df7877bca23a958627d60408bc Unfortunately this change broke boot on some systems that used an initrd which does not include the newly created crct10dif modules. As these modules are required by sd_mod under certain configurations this is a serious problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-07-23	perf/x86: Add ability to calculate TSC from perf sample timestamps	Adrian Hunter
	For modern CPUs, perf clock is directly related to TSC. TSC can be calculated from perf clock and vice versa using a simple calculation. Two of the three componenets of that calculation are already exported in struct perf_event_mmap_page. This patch exports the third. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23	x86/iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset	Neil Horman
	Recently we added an early quirk to detect 5500/5520 chipsets with early revisions that had problems with irq draining with interrupt remapping enabled: commit 03bbcb2e7e292838bb0244f5a7816d194c911d62 Author: Neil Horman <nhorman@tuxdriver.com> Date: Tue Apr 16 16:38:32 2013 -0400 iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets It turns out this same problem is present in the intel X58 chipset as well. See errata 69 here: http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html This patch extends the pci early quirk so that the chip devices/revisions specified in the above update are also covered in the same way: Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Donald Dutile <ddutile@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Malcolm Crossley <malcolm.crossley@citrix.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1374059639-8631-1-git-send-email-nhorman@tuxdriver.com [ Small edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23	x86/ia32/asm: Remove unused argument in macro	Ramkumar Ramachandra
	Commit 3fe26fa ("x86: get rid of pt_regs argument in sigreturn variants", from 2012-11-12) changed the body of PTREGSCALL to drop arg, and updated the callsites; unfortunately, it forgot to update the macro argument list, leaving an unused argument. Fix this. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/1373479468-7175-1-git-send-email-artagnon@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23	kprobes/x86: Call out into INT3 handler directly instead of using notifier	Jiri Kosina
	In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction patching"), the mechanism that was introduced for notifying alternatives code from int3 exception handler that and exception occured was die_notifier. This is however problematic, as early code might be using jump labels even before the notifier registration has been performed, which will then lead to an oops due to unhandled exception. One of such occurences has been encountered by Fengguang: int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000 RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225 RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8 FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0 Stack: ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48 ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68 ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98 Call Trace: <IRQ> [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79 [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84 [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0 [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41 [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80 <EOI> [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1 [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16 [<ffffffff81015f10>] default_idle+0x147/0x282 [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84 [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5 [...] Fix this by directly calling poke_int3_handler() from the int3 exception handler (analogically to what ftrace has been doing already), instead of relying on notifier, registration of which might not have yet been finalized by the time of the first trap. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23	x86/acpi: Fix incorrect sanity check in acpi_register_lapic()	Tang Chen
	We wanted to check if the APIC ID is out of range. It should be: if (id >= MAX_LOCAL_APIC) There's no known bad effect of this bug. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Len Brown <len.brown@intel.com> Cc: pavel@ucw.cz Cc: rjw@sisk.pl Link: http://lkml.kernel.org/r/1374566419-21120-1-git-send-email-tangchen@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23	x86 / PCI: prevent re-allocation of already existing bridge and ROM resources	Mika Westerberg
	In hotplug case (especially with Thunderbolt enabled systems) we might need to call pcibios_resource_survey_bus() several times for a bus. The function ends up calling pci_claim_resource() for each bridge resource that then fails claiming that the resource exists already (which it does). Once this happens the resource is invalidated thus preventing devices behind the bridge to allocate their resources. To fix this we do what has been done in pcibios_allocate_dev_resources() and check 'parent' of the given resource. If it is non-NULL it means that the resource has been allocated already and we can skip it. We do the same for ROM resources as well. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-22	mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default	Toshi Kani
	CONFIG_ARCH_MEMORY_PROBE enables the /sys/devices/system/memory/probe interface, which allows a given memory address to be hot-added as follows: # echo start_address_of_new_memory > /sys/devices/system/memory/probe (See Documentation/memory-hotplug.txt for more details.) This probe interface is required on powerpc. On x86, however, ACPI notifies a memory hotplug event to the kernel, which performs its hotplug operation as the result. Therefore, regular users do not need this interface on x86. This probe interface is also error-prone and misleading that the kernel blindly adds a given memory address without checking if the memory is present on the system; no probing is done despite of its name. The kernel crashes when a user requests to online a memory block that is not present on the system. This interface is currently used for testing as it can fake a hotplug event. This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds its Kconfig menu entry on x86, and clarifies its use in Documentation/ memory-hotplug.txt. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: linux-mm@kvack.org Cc: dave@sr71.net Cc: isimatu.yasuaki@jp.fujitsu.com Cc: tangchen@cn.fujitsu.com Cc: vasilis.liaskovitis@profitbricks.com Link: http://lkml.kernel.org/r/1374256068-26016-1-git-send-email-toshi.kani@hp.com [ Edited it slightly. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml	Linus Torvalds
	Pull UML fixes from Richard Weinberger: "Special thanks goes to Toralf Föster for continuously testing UML and reporting issues!" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: remove dead code um: siginfo cleanup uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases um: Fix wait_stub_done() error handling um: Mark stub pages mapping with VM_PFNMAP um: Fix return value of strnlen_user()
2013-07-19	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM fix from Paolo Bonzini: "This single patch fixes a regression caused by one of the optimizations introduced in 3.11, which is generally visible only on AMD processors" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MMU: avoid fast page fault fixing mmio page fault
2013-07-19	perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5	Andi Kleen
	[KVM maintainers: The underlying support for this is in perf/core now. So please merge this patch into the KVM tree.] This is not arch perfmon, but older CPUs will just ignore it. This makes it possible to do at least some TSX measurements from a KVM guest v2: Various fixes to address review feedback v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits. v4: Use reserved bits for #GP v5: Remove obsolete argument Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-19	um: remove dead code	Richard Weinberger
	"me" is not used. Signed-off-by: Richard Weinberger <richard@nod.at>
2013-07-19	kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions	Masami Hiramatsu
	Since introducing the text_poke_bp() for all text_poke_smp() callers, text_poke_smp() are now unused. This patch basically reverts: 3d55cc8a058e ("x86: Add text_poke_smp for SMP cross modifying code") 7deb18dcf047 ("x86: Introduce text_poke_smp_batch() for batch-code modifying") and related commits. This patch also fixes a Kconfig dependency issue on STOP_MACHINE in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19	kprobes/x86: Use text_poke_bp() instead of text_poke_smp*()	Masami Hiramatsu
	Use text_poke_bp() for optimizing kprobes instead of text_poke_smp*(). Since the number of kprobes is usually not so large (<100) and text_poke_bp() is much lighter than text_poke_smp() [which uses stop_machine()], this just stops using batch processing. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114750.26675.9174.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19	kprobes/x86: Remove an incorrect comment about int3 in NMI/MCE	Masami Hiramatsu
	Remove a comment about an int3 issue in NMI/MCE, since commit: 3f3c8b8c4b2a ("x86: Add workaround to NMI iret woes") already fixed that. Keeping this incorrect comment can mislead developers. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114747.26675.84110.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19	Merge branch 'x86/jumplabel' into perf/core	Ingo Molnar
	Upcoming kprobes patches rely on the int3 code-patching machinery introduced by: fd4363fff3d9 x86: Introduce int3 (breakpoint)-based instruction patching Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-18	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Trying again to get the fixes queue, including the fixed IDT alignment patch. The UEFI patch is by far the biggest issue at hand: it is currently causing quite a few machines to boot. Which is sad, because the only reason they would is because their BIOSes touch memory that has already been freed. The other major issue is that we finally have tracked down the root cause of a significant number of machines failing to suspend/resume" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Make sure IDT is page aligned x86, suspend: Handle CPUs which fail to #GP on RDMSR x86/platform/ce4100: Add header file for reboot type Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()" efivars: check for EFI_RUNTIME_SERVICES
2013-07-18	KVM: nVMX: Set segment infomation of L1 when L2 exits	Arthur Chunqi Li
	When L2 exits to L1, segment infomations of L1 are not set correctly. According to Intel SDM 27.5.2(Loading Host Segment and Descriptor Table Registers), segment base/limit/access right of L1 should be set to some designed value when L2 exits to L1. This patch fixes this. Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Reviewed-by: Gleb Natapov <gnatapov@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18	remove sched notifier for cross-cpu migrations	Marcelo Tosatti
	Linux as a guest on KVM hypervisor, the only user of the pvclock vsyscall interface, does not require notification on task migration because: 1. cpu ID number maps 1:1 to per-CPU pvclock time info. 2. per-CPU pvclock time info is updated if the underlying CPU changes. 3. that version is increased whenever underlying CPU changes. Which is sufficient to guarantee nanoseconds counter is calculated properly. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-18	KVM: nVMX: Fix read/write to MSR_IA32_FEATURE_CONTROL	Nadav Har'El
	Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478 Signed-off-by: Nadav Har'El <nyh@math.technion.ac.il> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-18	KVM: x86: Drop useless cast	Mathias Krause
	Void pointers don't need no casting, drop it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-18	KVM: VMX: Use proper types to access const arrays	Mathias Krause
	Use a const pointer type instead of casting away the const qualifier from const arrays. Keep the pointer array on the stack, nonetheless. Making it static just increases the object size. Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-18	KVM: nVMX: Set success rflags when emulate VMXON/VMXOFF in nested virt	Arthur Chunqi Li
	Set rflags after successfully emulateing VMXON/VMXOFF in VMX. Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18	KVM: nVMX: Change location of 3 functions in vmx.c	Arthur Chunqi Li
	Move nested_vmx_succeed/nested_vmx_failInvalid/nested_vmx_failValid ahead of handle_vmon to eliminate double declaration in the same file Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18	KVM: x86: Avoid zapping mmio sptes twice for generation wraparound	Takuya Yoshikawa
	Now that kvm_arch_memslots_updated() catches every increment of the memslots->generation, checking if the mmio generation has reached its maximum value is enough. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18	KVM: Introduce kvm_arch_memslots_updated()	Takuya Yoshikawa
	This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: Alexander Graf <agraf@suse.de> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18	KVM: MMU: avoid fast page fault fixing mmio page fault	Xiao Guangrong
	Currently, fast page fault incorrectly tries to fix mmio page fault when the generation number is invalid (spte.gen != kvm.gen). It then returns to guest to retry the fault since it sees the last spte is nonpresent. This causes an infinite loop. Since fast page fault only works for direct mmu, the issue exists when 1) tdp is enabled. It is only triggered only on AMD host since on Intel host the mmio page fault is recognized as ept-misconfig whose handler call fault-page path with error_code = 0 2) guest paging is disabled. Under this case, the issue is hardly discovered since paging disable is short-lived and the sptes will be invalid after memslot changed for 150 times Fix it by filtering out MMIO page faults in page_fault_can_be_fast. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-16	x86: Make jump_label use int3-based patching	Jiri Kosina
	Make jump labels use text_poke_bp() for text patching instead of text_poke_smp(), avoiding the need for stop_machine(). Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121120250.29788@pobox.suse.cz Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>