summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2016-11-16x86/mcheck: Move CPU_ONLINE and CPU_DOWN_PREPARE to hotplug state machineSebastian Andrzej Siewior
The CPU_ONLINE and CPU_DOWN_PREPARE look fully symmetrical and could be move to the hotplug state machine. On a failure during registration we have the tear down callback invoked (mce_cpu_pre_down()) so there should be no timer around and so no need to need keep notifier installed (this was the reason according to the comment why the notifier was registered despite of errors). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-7-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16x86/mcheck: Reorganize the hotplug callbacksSebastian Andrzej Siewior
Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let it become an independent early hotplug callback. The main problem here was that the init on the boot CPU may happen too late (device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiving and MCE event at boot time leading to a shutdown (if the MCE feature is not yet enabled). Here is attempt two: the timming stays as-is but the ordering of the functions is changed: - mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer struct but won't fire the timer. This is moved to CPU_ONLINE since its cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer early in the shutdown phase, it should be okay to start it late in the bring up phase. - CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in mce_disable_cpu(). If a failure occures it would be re-enabled on all vendor CPUs (including Intel where it was not disabled during shutdown). To keep this working I am moving it to CPU_ONLINE. smp_call_function_single() is dropped beause the notifier runs nowdays on the target CPU. - CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device() but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and mce_device_remove()). In order to keep this symmetrical I am moving the clean up from CPU_DEAD to CPU_DOWN_PREPARE. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-6-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16x86/mcheck: Split threshold_cpu_callback into two callbacksSebastian Andrzej Siewior
The threshold_cpu_callback callbacks looks like one of the notifier and its arguments are almost the same. Split this out and have one ONLINE and one DEAD callback. This will come handy later once the main code gets changed to use the callback mechanism. Also, handle threshold_cpu_callback_online() return value so we don't continue if the function fails. Boris Petkov removed the callback pointer and replaced it with proper functions. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-5-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16x86/mcheck: Be prepared for a rollback back to the ONLINE stateSebastian Andrzej Siewior
If we try a CPU down and fail in the middle then we roll back to the online state. This means we would perform CPU_ONLINE / mce_device_create() without invoking CPU_DEAD / mce_device_remove() for the cleanup of what was allocated in CPU_ONLINE. Be prepared for this and don't allocate the struct if we have it already. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-4-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16x86/mcheck: Explicit cleanup on failure in mce_amdSebastian Andrzej Siewior
If the ONLINE callback fails, the driver does not any clean up right away instead it waits to get to the DEAD stage to do it. Yes, it waits. Since we don't pass the error code back to the caller, no one knows. Do the clean up right away so it does not look like a leak. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-3-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16x86/mcheck: Move threshold_create_device()Sebastian Andrzej Siewior
Move the threshold_create_device() so it can use threshold_remove_device() without a forward declaration. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-2-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changeeFenghua Yu
If CPUs are moved to or removed from a rdtgroup, the percpu closid storage is updated. If tasks running on an affected CPU use the percpu closid then the PQR_ASSOC MSR is only updated when the task runs through a context switch. Up to the context switch the CPUs operate on the wrong closid. This state is potentially unbound. Make the change immediately effective by invoking a smp function call on the affected CPUs which stores the new closid in the perpu storage and calls the rdt_sched_in() function which updates the MSR, if the current task uses the percpu closid. [ tglx: Made it work and massaged changelog once more ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1478912558-55514-3-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15x86/intel_rdt: Reset per cpu closids on unmountFenghua Yu
All CPUs in a rdtgroup are given back to the default rdtgroup before the rdtgroup is removed during umount. After umount, the default rdtgroup contains all online CPUs, but the per cpu closids are not cleared. As a result the stale closid value will be used immediately after the next mount. Move all cpus to the default group and update the percpu closid storage. [ tglx: Massaged changelong ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1478912558-55514-2-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15x86/intel_rdt: Prevent deadlock against hotplug lockThomas Gleixner
The cpu online/offline callbacks of intel_rdt lock rdtgroup_mutex nested inside of cpu hotplug lock. rdtgroup_cpus_write() does it in reverse order. Remove the get/put_online_cpus() calls from rdtgroup_cpus_write(). This is safe against cpu hotplug as the resource group cpumasks are protected by rdtgroup_mutex. Found by review, but should have been found if authors would have bothered to test cpu hotplug with lockdep enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Shaohua Li <shli@fb.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com>
2016-11-15x86/intel_rdt: Protect info directory from removalFenghua Yu
The info directory and the per-resource subdirectories of the info directory have no reference to a struct rdtgroup in kn->priv. An attempt to remove one of those directories results in a NULL pointer dereference. Protect the directories from removal and return -EPERM instead of -ENOENT. [ tglx: Massaged changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1478912558-55514-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15sched/cputime: Simplify task_cputime()Stanislaw Gruszka
Now since fetch_task_cputime() has no other users than task_cputime(), its code could be used directly in task_cputime(). Moreover since only 2 task_cputime() calls of 17 use a NULL argument, we can add dummy variables to those calls and remove NULL checks from task_cputimes(). Also remove NULL checks from task_cputimes_scaled(). Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15x86/percpu: Remove unnecessary include of module.h, add asm/desc.hPaul Gortmaker
This was originally a part of commit 186f43608a5c: ("x86/kernel: Audit and remove any unnecessary uses of module.h") ...but without the asm/desc.h addition. As such, Ingo reported a build failure on i386 allnoconfig with SMP=y during his pre-merge testing. For expediency the chunk was just dropped at that time. The failure was as follows: arch/x86/kernel/setup_percpu.c: In function ‘setup_percpu_segment’: arch/x86/kernel/setup_percpu.c:159:2: error: implicit declaration of function ‘pack_descriptor’ [-Werror=implicit-function-declaration] arch/x86/kernel/setup_percpu.c:162:2: error: implicit declaration of function ‘write_gdt_entry’ [-Werror=implicit-function-declaration] arch/x86/kernel/setup_percpu.c:162:18: error: implicit declaration of function ‘get_cpu_gdt_table’ [-Werror=implicit-function-declaration] As pack_descriptor(), write_gdt_entry() and get_cpu_gdt_table() all live in the file arch/x86/include/asm/desc.h -- calling that header out explicitly should fix things. Reported-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161114190443.10873-1-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix an Intel/MID boot crash/hang bug - fix a cache topology mis-parsing bug on certain AMD CPUs - fix a virtualization firmware bug by adding a check+quirk workaround on the kernel side" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Deal with broken firmware (VMWare/XEN) x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook
2016-11-11x86: apm: avoid uninitialized dataArnd Bergmann
apm_bios_call() can fail, and return a status in its argument structure. If that status however is zero during a call from apm_get_power_status(), we end up using data that may have never been set, as reported by "gcc -Wmaybe-uninitialized": arch/x86/kernel/apm_32.c: In function ‘apm’: arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here This changes the function to return "APM_NO_ERROR" here, which makes the code more robust to broken BIOS versions, and avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11x86/MCE: Correct TSC timestamping of error recordsBorislav Petkov
We did have logic in the MCE code which would TSC-timestamp an error record only when it is exact - i.e., when it wasn't detected by polling. This isn't the case anymore. So let's fix that: We have a valid TSC timestamp in the error record only when it has been a precise detection, i.e., either in the #MC handler or in one of the interrupt handlers (thresholding, deferred, ...). All other error records still have mce.time which contains the wall time in order to be able to place the error record in time at least approximately. Also, this fixes another bug where machine_check_poll() would clear mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP logging. The proper fix would be to generate timestamp only when it has been requested and not always. But that would require a more thorough code audit of all mce_gather_info/mce_setup() users. Add a FIXME for now. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony <tony.luck@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: kernel test robot <xiaolong.ye@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lkp@01.org Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-10drivers: base: cacheinfo: fix x86 with CONFIG_OF enabledSudeep Holla
With CONFIG_OF enabled on x86, we get the following error on boot: " Failed to find cpu0 device node Unable to detect cache hierarchy from DT for CPU 0 " and the cacheinfo fails to get populated in the corresponding sysfs entries. This is because cache_setup_of_node looks for of_node for setting up the shared cpu_map without checking that it's already populated in the architecture specific callback. In order to indicate that the shared cpu_map is already populated, this patch introduces a boolean `cpu_map_populated` in struct cpu_cacheinfo that can be used by the generic code to skip cache_shared_cpu_map_setup. This patch also sets that boolean for x86. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-09x86/apic: Prevent tracing on apic_msr_write_eoi()Wanpeng Li
The following RCU lockdep warning led to adding irq_enter()/irq_exit() into smp_reschedule_interrupt(): RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! no locks held by swapper/1/0. do_trace_write_msr native_write_msr native_apic_msr_eoi_write smp_reschedule_interrupt reschedule_interrupt As Peterz pointed out: | So now we're making a very frequent interrupt slower because of debug | code. | | The thing is, many many smp_reschedule_interrupt() invocations don't | actually execute anything much at all and are only sent to tickle the | return to user path (which does the actual preemption). | | Having to do the whole irq_enter/irq_exit dance just for this unlikely | debug case totally blows. Use the wrmsr_notrace() variant in native_apic_msr_write_eoi, annotate the kvm variant with notrace and add a native_apic_eoi callback to the apic structure so KVM guests are covered as well. This allows to revert the irq_enter/irq_exit dance in smp_reschedule_interrupt(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: Mike Galbraith <efault@gmx.de> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1478488420-5982-3-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09x86/cpu: Deal with broken firmware (VMWare/XEN)Thomas Gleixner
Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. The physical package id which is used after a CPU comes up is retrieved from CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect. There exist VMware and XEN implementations which violate the spec. As a result the physical to logical package map, which relies on the ACPI/MP tables does not work on those systems, because the CPUID initialized physical package id does not match the firmware id. This causes system crashes and malfunction due to invalid package mappings. The only way to cure this is to sanitize the physical package id after the CPUID enumeration and yell when the APIC ids are different. Fix up the initial APIC id, which is fine as it is only used printout purposes. If the physical package IDs differ yell and use the package information from the ACPI/MP tables so the existing logical package map just works. Chas provided the resulting dmesg output for his affected 4 virtual sockets, 1 core per socket VM: [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2 [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2 .... Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>, Reported-by: M. Vefa Bicakci <m.v.b@runbox.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: #4.6+ <stable@vger,kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09x86/cpu/AMD: Clean up cpu_llc_id assignment per topology featureYazen Ghannam
These changes do not affect current hw - just a cleanup: Currently, we assume that a system has a single Last Level Cache (LLC) per node, and that the cpu_llc_id is thus equal to the node_id. This no longer applies since Fam17h can have multiple last level caches within a node. So group the cpu_llc_id assignment by topology feature and family in order to make the computation of cpu_llc_id on the different families more clear. Here is how the LLC ID is being computed on the different families: The NODEID_MSR feature only applies to Fam10h in which case the LLC is at the node level. The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only see multiple last level caches if L3 caches are available. Otherwise, the cpu_llc_id will default to be the phys_proc_id. We have L3 caches only on families 15h and 17h: - on Fam15h, the LLC is at the node level. - on Fam17h, the LLC is at the core complex level and can be found by right shifting the APIC ID. Also, keep the family checks explicit so that new families will fall back to the default, which will be node_id for TOPOEXT systems. Single node systems in families 10h and 15h will have a Node ID of 0 which will be the same as the phys_proc_id, so we don't need to check for multiple nodes before using the node_id. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Rewrote the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09Merge branch 'x86/urgent' into x86/cpu, to pick up dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systemsYazen Ghannam
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an underflow bug when extracting the socket_id value. It starts from 0 so subtracting 1 from it will result in an invalid value. This breaks scheduling topology later on since the cpu_llc_id will be incorrect. For example, the the cpu_llc_id of the *other* CPU in the loops in set_cpu_sibling_map() underflows and we're generating the funniest thread_siblings masks and then when I run 8 threads of nbench, they get spread around the LLC domains in a very strange pattern which doesn't give you the normal scheduling spread one would expect for performance. Other things like EDAC use cpu_llc_id so they will be b0rked too. So, the APIC ID is preset in APICx020 for bits 3 and above: they contain the core complex, node and socket IDs. The LLC is at the core complex level so we can find a unique cpu_llc_id by right shifting the APICID by 3 because then the least significant bit will be the Core Complex ID. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Cleaned up and extended the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v4.4.. Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-08x86/RAS: Hide SMCA bank namesBorislav Petkov
Add accessor functions and hide the smca_names array. Also, add a sanity-check to bank HWID assignment in get_smca_bank_info(). Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08x86/RAS: Rename smca_bank_names to smca_namesBorislav Petkov
Make it differ more from struct smca_bank_name for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08x86/RAS: Simplify SMCA HWID descriptor structBorislav Petkov
Call it simply smca_hwid and call local variables "hwid". More readable. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08x86/RAS: Simplify SMCA bank descriptor structBorislav Petkov
Call the struct simply smca_bank, it's instance ID can be simply ->id. Makes the code much more readable. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08x86/MCE: Dump MCE to dmesg if no consumersBorislav Petkov
When there are no error record consumers registered with the kernel, the only thing that appears in dmesg is something like: [ 300.000326] mce: [Hardware Error]: Machine check events logged and the error records are gone. Which is seriously counterproductive. So let's dump them to dmesg instead, in such a case. Requested-by: Eric Morton <Eric.Morton@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20161101120911.13163-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08x86/MCE: Do not look at panic_on_oops in the severity gradingYinghai Lu
The MCE tolerance levels control whether we panic on a machine check or do something else like generating a signal and logging error information. This is controlled by the mce=<level> command line parameter. However, if panic_on_oops is set, it will force a panic for such an MCE even though the user didn't want to. So don't check panic_on_oops in the severity grading anymore. One of the use cases for that is recovery from uncorrectable errors with mce=2. [ Boris: rewrite commit message. ] Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20160916202325.4972-1-yinghai@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07x86/intel_rdt: Export the minimum number of set mask bits in sysfsShaohua Li
The minimum number of bits set for a cache mask is checked by the kernel when writing a mask, but there is no way for the user to retrieve this information. Add a new file 'min_cbm_bits' to the info directory and export the information to user space. [ tglx: Massaged changelog ] Signed-off-by: Shaohua Li <shli@fb.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/e69b1ffa206d0353eea58101e1bf9b677d9732f7.1478207143.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07x86/intel_rdt: Propagate error in rdt_mount() properlyShaohua Li
gcc complains: "warning: ‘dentry’ may be used uninitialized in this function" The error exit path in rdt_mount(), which deals with a failure in rdtgroup_create_info_dir(), does not set the error code in dentry and returns the uninitialized dentry value. Add the missing error propagation. [tglx: Massaged changelog ] Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Shaohua Li <shli@fb.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/a56a556f6768dc12cadbf97f49e000189056f90e.1478207143.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-01x86/fpu: Remove clts()Andy Lutomirski
The kernel doesn't use clts() any more. Remove it and all of its paravirt infrastructure. A careful reader may notice that xen_clts() appears to have been buggy -- it didn't update xen_cr0_value. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/3d3c8ca62f17579b9849a013d71e59a4d5d1b079.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01x86/fpu: Handle #NM without FPU emulation as an errorAndy Lutomirski
Don't use CR0.TS. Make it an error rather than making nonsensical changes to the FPU state. (The cond_local_irq_enable() appears to have been pointless, too.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/f1ee6bf73ed1025fccaab321ba43d0594245f927.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01x86/fpu: Remove irq_ts_save() and irq_ts_restore()Andy Lutomirski
Now that lazy FPU is gone, we don't use CR0.TS (except possibly in KVM guest mode). Remove irq_ts_save(), irq_ts_restore(), and all of their callers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/70b9b9e7ba70659bedcb08aba63d0f9214f338f2.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs()Andy Lutomirski
fpu__init_check_bugs() runs long after the early FPU init, so CR0.TS will be clear by the time it runs. The save-and-restore dance would have been unnecessary anyway, though, as kernel_fpu_begin() would have been good enough. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/76d1f1eacb5caead98197d1eb50ac6110ab20c6a.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01x86/fpu: Get rid of two redundant clts() callsAndy Lutomirski
CR0.TS is cleared by a direct CR0 write in fpu__init_cpu_generic(). We don't need to call clts() two more times right after that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/476d2d5066eda24838853426ea74c94140b50c85.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01Merge branch 'core/urgent' into x86/fpu, to merge fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01Merge branch 'linus' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-30x86/intel_rdt: Add scheduler hookFenghua Yu
Hook the x86 scheduler code to update closid based on whether the current task is assigned to a specific closid or running on a CPU assigned to a specific closid. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-10-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Add schemata fileTony Luck
Last of the per resource group files. Also mode 0644. This one shows the resources available to the group. Syntax depends on whether the "cdp" mount option was given. With code/data prioritization disabled it is simply a list of masks for each cache domain. Initial value allows access to all of the L3 cache on all domains. E.g. on a 2 socket Broadwell: L3:0=fffff;1=fffff With CDP enabled, separate masks for data and instructions are provided: L3DATA:0=fffff;1=fffff L3CODE:0=fffff;1=fffff Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-9-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Add tasks filesFenghua Yu
The root directory all subdirectories are automatically populated with a read/write (mode 0644) file named "tasks". When read it will show all the task IDs assigned to the resource group. Tasks can be added (one at a time) to a group by writing the task ID to the file. E.g. Membership in a resource group is indicated by a new field in the task_struct "int closid" which holds the CLOSID for each task. The default resource group uses CLOSID=0 which means that all existing tasks when the resctrl file system is mounted belong to the default group. If a group is removed, tasks which are members of that group are moved to the default group. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Add cpus fileTony Luck
Now we populate each directory with a read/write (mode 0644) file named "cpus". This is used to over-ride the resources available to processes in the default resource group when running on specific CPUs. Each "cpus" file reads as a cpumask showing which CPUs belong to this resource group. Initially all online CPUs are assigned to the default group. They can be added to other groups by writing a cpumask to the "cpus" file in the directory for the resource group (which will remove them from the previous group to which they were assigned). CPU online/offline operations will delete CPUs that go offline from whatever group they are in and add new CPUs to the default group. If there are CPUs assigned to a group when the directory is removed, they are returned to the default group. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-7-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Add mkdir to resctrl file systemFenghua Yu
Resource control groups are represented as directories in the resctrl file system. The root directory describes the default resources available to tasks that have not been assigned specific resources. Other directories can be created at the root level to make new resource groups. It is not permitted to make directories within other directories. Hardware uses a CLOSID (Class of service ID) to determine which resource limits are currently in effect. The exact number available is enumerated by CPUID leaf 0x10, but on current implementations it is a small number. We implement a simple bitmask allocator for CLOSIDs. Each resource control group uses one CLOSID, which limits the total number of directories that can be created. Resource groups can be removed using rmdir. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Add "info" files to resctrl file systemFenghua Yu
For the convenience of applications we make the decoded values of some of the CPUID values available in read-only (0444) files. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Add basic resctrl filesystem supportFenghua Yu
Use kernfs as basis for our user interface filesystem. This patch supports mount/umount, and one mount parameter "cdp" to enable code/data prioritization (though all we do at this point is ensure that the system can support CDP). The file system is not populated yet in this patch. [ tglx: Fixed up a few nits and added cdp handling in case of error ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/intel_rdt: Build structures for each resource based on cache topologyTony Luck
We use the cpu hotplug notifier to catch each cpu in turn and look at its cache topology w.r.t each of the resource groups. As we discover new resources, we initialize the bitmask array for each to the default (full access) value. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-3-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/vmware: Add paravirt sched clockAlexey Makhalov
The default sched_clock() implementation is native_sched_clock(). It contains code to handle non constant frequency TSCs, which creates overhead for systems with constant frequency TSCs. The vmware hypervisor guarantees a constant frequency TSC, so native_sched_clock() is not required and slower than a dedicated function which operates with one time calculated conversion factors. Calculate the conversion factors at boot time from the tsc frequency and install an optimized sched_clock() function via paravirt ops. The paravirtualized clock can be disabled on the kernel command line with the new 'no-vmw-sched-clock' option. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: linux-doc@vger.kernel.org Cc: pv-drivers@vmware.com Cc: corbet@lwn.net Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161028075432.90579-4-amakhalov@vmware.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/vmware: Add basic paravirt ops supportAlexey Makhalov
Add basic paravirt support: 1. Set pv_info.name to "VMware hypervisor" to have proper boot log message Booting paravirtualized kernel on VMware hypervisor instead of "... on bare hardware" 2. Set pv_cpu_ops.io_delay() to empty function - paravirt_noop() to avoid vm-exits on IO delays because io delays they are not required. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: linux-doc@vger.kernel.org Cc: pv-drivers@vmware.com Cc: corbet@lwn.net Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161028075432.90579-3-amakhalov@vmware.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30x86/vmware: Use tsc_khz value for calibrate_cpu()Alexey Makhalov
Commit aa297292d708 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") separated the calibration mechanisms for cpu_khz and tsc_khz. Since the vmware hypervisor provides a constant frequency TSC to the guest, this change can lead to divergence between the tsc and the cpu frequency after vmotion, which might confuse the user. Solve this by overriding the x86 platform cpu calibration callback with the vmware specific tsc calibration function. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: linux-doc@vger.kernel.org Cc: pv-drivers@vmware.com Cc: corbet@lwn.net Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161028075432.90579-2-amakhalov@vmware.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-29x86/smpboot: Init apic mapping before usageThomas Gleixner
The recent changes, which forced the registration of the boot cpu on UP systems, which do not have ACPI tables, have been fixed for systems w/o local APIC, but left a wreckage for systems which have neither ACPI nor mptables, but the CPU has an APIC, e.g. virtualbox. The boot process crashes in prefill_possible_map() as it wants to register the boot cpu, which needs to access the local apic, but the local APIC is not yet mapped. There is no reason why init_apic_mapping() can't be invoked before prefill_possible_map(). So instead of playing another silly early mapping game, as the ACPI/mptables code does, we just move init_apic_mapping() before the call to prefill_possible_map(). In hindsight, I should have noticed that combination earlier. Sorry for the churn (also in stable)! Fixes: ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC") Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com> Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at> Cc: prarit@redhat.com Cc: ville.syrjala@linux.intel.com Cc: michael.thayer@oracle.com Cc: knut.osmundsen@oracle.com Cc: frank.mehnert@oracle.com Cc: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-28Merge tag 'acpi-4.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix recent ACPICA regressions, an older PCI IRQ management regression, and an incorrect return value of a function in the APEI code. Specifics: - Fix three ACPICA issues related to the interpreter locking and introduced by recent changes in that area (Lv Zheng). - Fix a PCI IRQ management regression introduced during the 4.7 cycle and related to the configuration of shared IRQs on systems with an ISA bus (Sinan Kaya). - Fix up a return value of one function in the APEI code (Punit Agrawal)" * tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region() ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method() ACPICA: Dispatcher: Fix order issue of method termination ACPI / APEI: Fix incorrect return value of ghes_proc() ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs ACPI/PCI: pci_link: penalize SCI correctly ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
2016-10-28x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=yBorislav Petkov
We needed the physical address of the container in order to compute the offset within the relocated ramdisk. And we did this by doing __pa() on the virtual address. However, __pa() does checks whether the physical address is within PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address which *doesn't* have the randomization offset into a function which uses PAGE_OFFSET which *does* have that offset. This makes this check fire: VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x)); ^^^^^^ due to the randomization offset. The fix is as simple as using __pa_nodebug() because we do that randomization offset accounting later in that function ourselves. Reported-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm <linux-mm@kvack.org> Cc: stable@vger.kernel.org # 4.9 Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>