summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-04-17x86/intel_rdt: Get rid of anon unionThomas Gleixner
gcc-4.4.3 fails to statically initialize members of a anon union. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 The storage saving is not really worth it and aside of that it will catch usage of the cache member for bandwidth and vice versa easier. Fixes: 05b93417ce5b ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull nvdimm fixes from Dan Williams: "A small crop of lockdep, sleeping while atomic, and other fixes / band-aids in advance of the full-blown reworks targeting the next merge window. The largest change here is "libnvdimm: fix blk free space accounting" which deletes a pile of buggy code that better testing would have caught before merging. The next change that is borderline too big for a late rc is switching the device-dax locking from rcu to srcu, I couldn't think of a smaller way to make that fix. The __copy_user_nocache fix will have a full replacement in 4.12 to move those pmem special case considerations into the pmem driver. The "libnvdimm: band aid btt vs clear poison locking" commit admits that our error clearing support for btt went in broken, so we just disable it in 4.11 and -stable. A replacement / full fix is in the pipeline for 4.12 Some of these would have been caught earlier had DEBUG_ATOMIC_SLEEP been enabled on my development station. I wonder if we should have: config DEBUG_ATOMIC_SLEEP default PROVE_LOCKING ...since I mistakenly thought I got both with PROVE_LOCKING=y. These have received a build success notification from the 0day robot, and some have appeared in a -next release with no reported issues" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation libnvdimm: band aid btt vs clear poison locking libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat libnvdimm: fix blk free space accounting acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
2017-04-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of small fixes for x86: - fix locking in RDT to prevent memory leaks and freeing in use memory - prevent setting invalid values for vdso32_enabled which cause inconsistencies for user space resulting in application crashes. - plug a race in the vdso32 code between fork and sysctl which causes inconsistencies for user space resulting in application crashes. - make MPX signal delivery work in compat mode - make the dmesg output of traps and faults readable again" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix locking in rdtgroup_schemata_write() x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection() x86/vdso: Plug race between mapping and ELF header setup x86/vdso: Ensure vdso32_enabled gets set to valid values only x86/signals: Fix lower/upper bound reporting in compat siginfo
2017-04-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two small fixes for perf: - the move to support cross arch annotation introduced per arch initialization requirements, fullfill them for s/390 (Christian Borntraeger) - add the missing initialization to the LBR entries to avoid exposing random or stale data" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() perf annotate s390: Fix perf annotate error -95 (4.10 regression)
2017-04-14Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Thomas Gleixner: "Three fixes from EFI land: - prevent accessing a Graphic Output Device (GOP) which the kernel does not know to handle - prevent PCI reconfiguration to modify a BAR which covers the framebuffer because that's already in use through the EFI GOP interface - avoid reserving EFI runtime regions as this results in bogus memory mappings" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Don't try to reserve runtime regions efi/fb: Avoid reconfiguration of BAR that covers the framebuffer efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
2017-04-14x86/irq: Remove a redundant #ifdef directiveDou Liyang
The call to irq_ctx_init() is wrapped in #ifdef CONFIG_X86_32. The declaration of irq_ctx_init in irq.h provides already a stub inline for the X86_32=n case. Remove the redundant #ifdef in the code. [ tglx: Massaged changelog ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1491811500-30307-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/smp: Remove the redundant #ifdef CONFIG_SMP directiveDou Liyang
The !CONFIG_X86_LOCAL_APIC section in smp.h wraps the define of hard_smp_processor_id() into #ifndef CONFIG_SMP. But Kconfig has: config X86_LOCAL_APIC def_bool y depends on X86_64 || SMP || X86_32_NON_STANDARD ... Therefore SMP can't be 'y' when X86_LOCAL_APIC == 'n'. Remove the redundant #ifndef CONFIG_SMP. [ tglx: Massaged changelog ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: jaswinder@infradead.org Link: http://lkml.kernel.org/r/1491734806-15413-2-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/smp: Reduce code duplicationDou Liyang
The CONFIG_X86_32_SMP and CONFIG_X86_64_SMP sections in smp.h contain duplicate defines. Merge them and only put the difference into an #ifdeff'ed section. [ tglx: Massaged changelog ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: jaswinder@infradead.org Link: http://lkml.kernel.org/r/1491734806-15413-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticksNicolai Stange
In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Currently, the x86's uv rtc clockevent device is initialized as follows: clock_event_device_uv.min_delta_ns = NSEC_PER_SEC / sn_rtc_cycles_per_second; clock_event_device_uv.max_delta_ns = clocksource_uv.mask * (NSEC_PER_SEC / sn_rtc_cycles_per_second); This translates to a ->min_delta_ticks value of 1 and a ->max_delta_ticks value of clocksource_uv.mask. Initialize ->min_delta_ticks and ->max_delta_ticks with these values respectively. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Mike Travis <travis@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14x86/apic/timer: Set ->min_delta_ticks and ->max_delta_ticksNicolai Stange
In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the x86 arch's apic clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org CC: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14x86/lguest/timer: Set ->min_delta_ticks and ->max_delta_ticksNicolai Stange
In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the x86 arch's lguest clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: lguest@lists.ozlabs.org Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14x86/xen/time: Set ->min_delta_ticks and ->max_delta_ticksNicolai Stange
In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the x86 arch's xen clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14x86/cpu: Keep model defines sorted by model numberAndy Shevchenko
For better maintenance keep it sorted by numeric model ID. Add new lines to seperate model groups. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20170316155045.50389-1-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt/mba: Add schemata file support for MBAVikas Shivappa
Add support to update the MBA bandwidth values for the domains via the schemata file. - Verify that the bandwidth value is valid - Round to the next control step depending on the bandwidth granularity of the hardware - Convert the bandwidth to delay values and write the delay values to the corresponding domain PQOS_MSRs. [ tglx: Massaged changelog ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-9-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt: Make schemata file parsers resource specificVikas Shivappa
The schemata files are the user space interface to update resource controls. The parser is hardwired to support only cache resources, which do not fit the requirements of memory resources. Add a function pointer for a parser to the struct rdt_resource and switch the cache parsing over. [ tglx: Massaged changelog ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-8-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt/mba: Add info directory files for Memory Bandwidth AllocationVikas Shivappa
The files in the info directory for MBA are as follows: num_closids The maximum number of CLOSids available for MBA min_bandwidth The minimum memory bandwidth percentage value bandwidth_gran The granularity of the bandwidth control in percent for the particular CPU SKU. Intermediate values entered are rounded off to the previous control step available. Available bandwidth control steps are minimum_bandwidth + N * bandwidth_gran. delay_linear When set, the OS writes a linear percentage based value to the control MSRs ranging from minimum_bandwidth to 100 percent. This value is informational and has no influence on the values written to the schemata files. The values written to the schemata are always bandwidth percentage that is requested. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-7-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt: Make information files resource specificVikas Shivappa
Cache allocation and memory bandwidth allocation require different information files in the resctrl/info directory, but the current implementation does not allow to have files per resource. Add the necessary fields to the resource struct and assign the files dynamically depending on the resource type. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-6-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)Vikas Shivappa
The MBA feature details like minimum bandwidth supported, bandwidth granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3. Setup and initialize the MBA specific extensions to data structures like global list of RDT resources, RDT resource structure and RDT domain structure. [ tglx: Split out the seperate structure and the CBM related parts ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-5-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt/mba: Memory bandwith allocation feature detectVikas Shivappa
Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1. Add supporting data structures to detect feature details which is done in later patch using CPUID with EAX=10H, ECX= 3. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt: Add resource specific msr update functionThomas Gleixner
Updating of Cache and Memory bandwidth QOS MSRs is different. Add a function pointer to struct rdt_resource and convert the cache part over. Based on Vikas all in one patch^Wmess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com
2017-04-14x86/intel_rdt: Move CBM specific data into a structThomas Gleixner
Memory bandwidth allocation requires different information than cache allocation. To avoid a lump of data in struct rdt_resource, move all cache related information into a seperate structure and add that to struct rdt_resource. Sanitize the data types while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com
2017-04-14x86/intel_rdt: Cleanup namespace to support multiple resource typesVikas Shivappa
Lot of data structures and functions are named after cache specific resources(named after cbm, cache etc). In many cases other non cache resources may need to share the same data structures/functions. Generalize such naming to prepare to add more resources like memory bandwidth. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/intel_rdt: Organize code properlyThomas Gleixner
Having init functions at random places in the middle of the code is unintuitive. Move them close to the init routine and mark them __init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com
2017-04-14x86/intel_rdt: Init padding only if a device existsThomas Gleixner
If no device exists it's pointless to calculate the padding data for the schemata files. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com
2017-04-14x86: i8259: export legacy_pic symbolHans de Goede
The classic PC rtc-coms driver has a workaround for broken ACPI device nodes for it which lack an irq resource. This workaround used to unconditionally hardcode the irq to 8 in these cases. This was causing irq conflict problems on systems without a legacy-pic so a recent patch added an if (nr_legacy_irqs()) guard to the workaround to avoid this irq conflict. nr_legacy_irqs() uses the legacy_pic symbol under the hood causing an undefined symbol error if the rtc-cmos code is build as a module. This commit exports the legacy_pic symbol to fix this. Cc: rtc-linux@googlegroups.com Cc: alexandre.belloni@free-electrons.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-04-14ftrace/x86: Fix triple fault with graph tracing and suspend-to-ramJosh Poimboeuf
On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function graph tracing and then suspend to RAM, it will triple fault and reboot when it resumes. The first fault happens when booting a secondary CPU: startup_32_smp() load_ucode_ap() prepare_ftrace_return() ftrace_graph_is_dead() (accesses 'kill_ftrace_graph') The early head_32.S code calls into load_ucode_ap(), which has an an ftrace hook, so it calls prepare_ftrace_return(), which calls ftrace_graph_is_dead(), which tries to access the global 'kill_ftrace_graph' variable with a virtual address, causing a fault because the CPU is still in real mode. The fix is to add a check in prepare_ftrace_return() to make sure it's running in protected mode before continuing. The check makes sure the stack pointer is a virtual kernel address. It's a bit of a hack, but it's not very intrusive and it works well enough. For reference, here are a few other (more difficult) ways this could have potentially been fixed: - Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging is enabled. (No idea what that would break.) - Track down load_ucode_ap()'s entire callee tree and mark all the functions 'notrace'. (Probably not realistic.) - Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu() or __cpu_up(), and ensure that the pause facility can be queried from real mode. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net> Cc: linux-acpi@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: stable@kernel.org Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/boot/e820: Remove a redundant self assignmentColin King
Remove a redundant self assignment of table->nr_entries, it does nothing and is an artifact of code simplification re-work. Detected by CoverityScan, CID#1428450 ("Self assignment") Fixes: 441ac2f33dd7 ("x86/boot/e820: Simplify e820__update_table()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: kernel-janitors@vger.kernel.org Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/20170413155912.12078-1-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14x86/mce: Enable PPIN for Knights Landing/MillPiotr Luc
Intel Xeon Phi processors (KNL and KNM) support PPIN as well, so add their CPUIDs to the whitelist of supported processors. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170408172004.8463-1-piotr.luc@intel.com Link: http://lkml.kernel.org/r/20170413201056.10525-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14perf/x86: Fix spurious NMI with PEBS Load Latency eventKan Liang
Spurious NMIs will be observed with the following command: while :; do perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp" -e "cpu/umask=0x03,event=0x0/" -e "cpu/umask=0x02,event=0x0/" -e cycles,branches,cache-misses -e cache-references -- sleep 10 done The bug was introduced by commit: 8077eca079a2 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+") That commit clears the status bits for the counters used for PEBS events, by masking the whole 64 bits pebs_enabled. However, only the low 32 bits of both status and pebs_enabled are reserved for PEBS-able counters. For status bits 32-34 are fixed counter overflow bits. For pebs_enabled bits 32-34 are for PEBS Load Latency. In the test case, the PEBS Load Latency event and fixed counter event could overflow at the same time. The fixed counter overflow bit will be cleared by mistake. Once it is cleared, the fixed counter overflow never be processed, which finally trigger spurious NMI. Correct the PEBS enabled mask by ignoring the non-PEBS bits. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 8077eca079a2 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+") Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14x86/unwind: Silence entry-related warningsJosh Poimboeuf
A few people have reported unwinder warnings like the following: WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value (null) unwind stack type:0 next_sp: (null) mask:2 graph_idx:0 ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0) ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c) ffffc90000fe7fa8: 0000000000000246 (0x246) ffffc90000fe7fb0: 0000000000000000 ... ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc) ffffc90000fe7fc8: 0000000000000006 (0x6) ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5) ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0) ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0) ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970) ffffc90000fe7ff0: 0000000000000000 ... ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f) This warning can happen when unwinding a code path where an interrupt occurred in x86 entry code before it set up the first stack frame. Silently ignore any warnings for this case. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14x86/unwind: Read stack return address in update_stack_state()Josh Poimboeuf
Instead of reading the return address when unwind_get_return_address() is called, read it from update_stack_state() and store it in the unwind state. This enables the next patch to check the return address from unwind_next_frame() so it can detect an entry code frame. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/af0c5e4560c49c0343dca486ea26c4fa92bc4e35.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14x86/unwind: Move common code into update_stack_state()Josh Poimboeuf
The __unwind_start() and unwind_next_frame() functions have some duplicated functionality. They both call decode_frame_pointer() and set state->regs and state->bp accordingly. Move that functionality to a common place in update_stack_state(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/a2ee4801113f6d2300d58f08f6b69f85edf4eb43.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()Peter Zijlstra
When the perf_branch_entry::{in_tx,abort,cycles} fields were added, intel_pmu_lbr_read_32() wasn't updated to initialize them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Fixes: 135c5612c460 ("perf/x86/intel: Support Haswell/v4 LBR format") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-13KVM: nVMX: fix AD condition when handling EPT violationRadim Krčmář
I have introduced this bug when applying and simplifying Paolo's patch as we agreed on the list. The original was "x &= ~y; if (z) x |= y;". Here is the story of a bad workflow: A maintainer was already testing with the intended change, but it was applied only to a testing repo on a different machine. When the time to push tested patches to kvm/next came, he realized that this change was missing and quickly added it to the maintenance repo, didn't test again (because the change is trivial, right), and pushed the world to fire. Fixes: ae1e2d1082ae ("kvm: nVMX: support EPT accessed/dirty bits") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-13platform/x86: intel_scu_ipc: Introduce intel_scu_ipc_raw_command()Andy Shevchenko
A new call to SCU intel_scu_ipc_raw_command() writes SPTR and DPTR registers before sending a command. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-04-13x86/efi: Don't try to reserve runtime regionsOmar Sandoval
Reserving a runtime region results in splitting the EFI memory descriptors for the runtime region. This results in runtime region descriptors with bogus memory mappings, leading to interesting crashes like the following during a kexec: general protection fault: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53 Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05 09/30/2016 RIP: 0010:virt_efi_set_variable() ... Call Trace: efi_delete_dummy_variable() efi_enter_virtual_mode() start_kernel() ? set_init_arg() x86_64_start_reservations() x86_64_start_kernel() start_cpu() ... Kernel panic - not syncing: Fatal exception Runtime regions will not be freed and do not need to be reserved, so skip the memmap modification in this case. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dave Young <dyoung@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 8e80632fb23f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()") Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12Merge branch 'for-4.11/libnvdimm' into for-4.12/daxDan Williams
2017-04-13x86/mm: Fix dump pagetables for 4 levels of page tablesJuergen Gross
Commit fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging") introduced an error for dumping with only 4 levels by setting PGD_LEVEL_MULT to a wrong value. This is leading to e.g. addresses printed as "(null)" for ranges: x86/mm: Found insecure W+X mapping at address (null)/(null) Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD Fixes: fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-12x86, pmem: fix broken __copy_user_nocache cache-bypass assumptionsDan Williams
Before we rework the "pmem api" to stop abusing __copy_user_nocache() for memcpy_to_pmem() we need to fix cases where we may strand dirty data in the cpu cache. The problem occurs when copy_from_iter_pmem() is used for arbitrary data transfers from userspace. There is no guarantee that these transfers, performed by dax_iomap_actor(), will have aligned destinations or aligned transfer lengths. Backstop the usage __copy_user_nocache() with explicit cache management in these unaligned cases. Yes, copy_from_iter_pmem() is now too big for an inline, but addressing that is saved for a later patch that moves the entirety of the "pmem api" into the pmem driver directly. Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()") Cc: <stable@vger.kernel.org> Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-12KVM: x86: Add MSR_AMD64_DC_CFG to the list of ignored MSRsLadi Prosek
Hyper-V writes 0x800000000000 to MSR_AMD64_DC_CFG when running on AMD CPUs as recommended in erratum 383, analogous to our svm_init_erratum_383. By ignoring the MSR, this patch enables running Hyper-V in L1 on AMD. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12mm: Tighten x86 /dev/mem with zeroing readsKees Cook
Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is disallowed. However, on x86, the first 1MB was always allowed for BIOS and similar things, regardless of it actually being System RAM. It was possible for heap to end up getting allocated in low 1MB RAM, and then read by things like x86info or dd, which would trip hardened usercopy: usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes) This changes the x86 exception for the low 1MB by reading back zeros for System RAM areas instead of blindly allowing them. More work is needed to extend this to mmap, but currently mmap doesn't go through usercopy, so hardened usercopy won't Oops the kernel. Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com> Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-12x86/kvm: virt_xxx memory barriers instead of mandatory barriersWanpeng Li
virt_xxx memory barriers are implemented trivially using the low-level __smp_xxx macros, __smp_xxx is equal to a compiler barrier for strong TSO memory model, however, mandatory barriers will unconditional add memory barriers, this patch replaces the rmb() in kvm_steal_clock() by virt_rmb(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12KVM: x86: fix maintaining of kvm_clock stability on guest CPU hotplugDenis Plotnikov
VCPU TSC synchronization is perfromed in kvm_write_tsc() when the TSC value being set is within 1 second from the expected, as obtained by extrapolating of the TSC in already synchronized VCPUs. This is naturally achieved on all VCPUs at VM start and resume; however on VCPU hotplug it is not: the newly added VCPU is created with TSC == 0 while others are well ahead. To compensate for that, consider host-initiated kvm_write_tsc() with TSC == 0 a special case requiring synchronization regardless of the current TSC on other VCPUs. Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12KVM: x86: remaster kvm_write_tsc codeDenis Plotnikov
Reuse existing code instead of using inline asm. Make the code more concise and clear in the TSC synchronization part. Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12KVM: x86: use irqchip_kernel() to check for pic+ioapicDavid Hildenbrand
Although the current check is not wrong, this check explicitly includes the pic. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12KVM: x86: simplify pic_ioport_read()David Hildenbrand
Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12KVM: x86: set data directly in picdev_read()David Hildenbrand
Now it looks almost as picdev_write(). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-12KVM: x86: drop picdev_in_range()David Hildenbrand
We already have the exact same checks a couple of lines below. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>