summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/mcheck
AgeCommit message (Collapse)Author
2014-08-08arch/x86: replace strict_strto callsDaniel Walter
Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: Daniel Walter <dwalter@google.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-05x86: MCE: Add raw_lock conversion againThomas Gleixner
Commit ea431643d6c3 ("x86/mce: Fix CMCI preemption bugs") breaks RT by the completely unrelated conversion of the cmci_discover_lock to a regular (non raw) spinlock. This lock was annotated in commit 59d958d2c7de ("locking, x86: mce: Annotate cmci_discover_lock as raw") with a proper explanation why. The argument for converting the lock back to a regular spinlock was: - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. Which is complete nonsense. The raw_spinlock is disabling preemption in the same way as a regular spinlock. In mainline spinlock maps to raw_spinlock, in RT spinlock becomes a "sleeping" lock. raw_spinlock has on RT exactly the same semantics as in mainline. And because this lock is taken in non preemptible context it must be raw on RT. Undo the locking brainfart. Reported-by: Clark Williams <williams@redhat.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-24x86, MCE: Robustify mcheck_init_deviceBorislav Petkov
BorisO reports that misc_register() fails often on xen. The current code unregisters the CPU hotplug notifier in that case. If then a CPU is offlined and onlined back again, we end up with a second timer running on that CPU, leading to soft lockups and system hangs. So let's leave the hotcpu notifier always registered - even if mce_device_create failed for some cores and never unreg it so that we can deal with the timer handling accordingly. Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-06-22x86, MCE: Kill CPU_POST_DEADBorislav Petkov
In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Link: http://lkml.kernel.org/r/http://lkml.kernel.org/r/1400750624-19238-1-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de>
2014-06-04hwpoison: remove unused global variable in do_machine_check()Chen Yucong
Remove an unused global variable mce_entry and relative operations in do_machine_check(). Signed-off-by: Chen Yucong <slaoub@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-30mce: Panic when a core has reached a timeoutBorislav Petkov
There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-30x86/mce: Improve mcheck_init_device() error handlingMathieu Souchaud
Check return code of every function called by mcheck_init_device(). Signed-off-by: Mathieu Souchaud <mattieu.souchaud@free.fr> Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.fr Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-05asmlinkage, x86: Add explicit __visible to arch/x86/*Andi Kleen
As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-17x86/mce: Fix CMCI preemption bugsIngo Molnar
The following commit: 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms") Added two preemption bugs: - machine_check_poll() does a get_cpu_var() without a matching put_cpu_var(), which causes preemption imbalance and crashes upon bootup. - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. To fix these bugs fix the imbalance and change cmci_discover_lock to a regular spinlock. Reported-by: Owen Kibel <qmewlo@gmail.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Todorov <atodorov@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> -- arch/x86/kernel/cpu/mcheck/mce.c | 4 +--- arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++--------- 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of minor fixes for x86, plus the IRET information leak fix (forbid the use of 16-bit segments in 64-bit mode)" NOTE! We may have to relax the "forbid the use of 16-bit segments in 64-bit mode" part, since there may be people who still run and depend on 16-bit Windows binaries under Wine. But I'm taking this in the current unconditional form for now to see who (if anybody) screams bloody murder. Maybe nobody cares. And maybe we'll have to update it with some kind of runtime enablement (like our vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already need to tweak). * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels efi: Pass correct file handle to efi_file_{read,close} x86/efi: Correct EFI boot stub use of code32_start x86/efi: Fix boot failure with EFI stub x86/platform/hyperv: Handle VMBUS driver being a module x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround x86, CMCI: Add proper detection of end of CMCI storms
2014-03-28x86, CMCI: Add proper detection of end of CMCI stormsChen, Gong
When CMCI storm persists for a long time(at least beyond predefined threshold. It's 30 seconds for now), we can watch CMCI storm is detected immediately after it subsides. ... Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode ... The problem is that our logic that determines that the storm has ended is incorrect. We announce the end, re-enable interrupts and realize that the storm is still going on, so we switch back to polling mode. Rinse, repeat. When a storm happens we disable signaling of errors via CMCI and begin polling machine check banks instead. If we find any logged errors, then we need to set a per-cpu flag so that our per-cpu tests that check whether the storm is ongoing will see that errors are still being logged independently of whether mce_notify_irq() says that the error has been fully processed. cmci_clear() is not the right tool to disable a bank. It disables the interrupt for the bank as desired, but it also clears the bit for this bank in "mce_banks_owned" so we will skip the bank when polling (so we fail to see that the storm continues because we stop looking). New cmci_storm_disable_banks() just disables the interrupt while allowing polling to continue. Reported-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-03-20x86, therm_throt.c: Remove unused therm_cpu_lockSrivatsa S. Bhat
After fixing the CPU hotplug callback registration code, the callbacks invoked for each online CPU, during the initialization phase in thermal_throttle_init_device(), can no longer race with the actual CPU hotplug notifier callbacks (in thermal_throttle_cpu_callback). Hence the therm_cpu_lock is unnecessary now. Remove it. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20x86, therm_throt.c: Fix CPU hotplug callback registrationSrivatsa S. Bhat
Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the thermal throttle code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20x86, mce: Fix CPU hotplug callback registrationSrivatsa S. Bhat
Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the mce code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-20Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
2014-01-12x86, mce: Fix mce_start_timer semanticsBorislav Petkov
So mce_start_timer() has a 'cpu' argument which is supposed to mean to start a timer on that cpu. However, the code currently starts a timer on the *current* cpu the function runs on and causes the sanity-check in mce_timer_fn to fire: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn because it is running on the wrong cpu. This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining all the cpus in succession. Then, we were fiddling with the CMCI storm settings when starting the timer whereas there's no need for that - if there's storm happening on this newly restarted cpu, we're going to be in normal CMCI mode initially and then when the CMCI interrupt starts firing, we're going to go to the polling mode with the timer real soon. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
2014-01-06x86: Delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-12-21ACPI, APEI, GHES: Do not report only correctable errors with SCIChen, Gong
Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30x86, mce: Call put_device on device_register failureLevente Kurusa
This patch adds a call to put_device() when the device_register() call has failed. This is required so that the last reference to the device is given up. Signed-off-by: Levente Kurusa <levex@linux.com> Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-10-23ACPI, APEI, CPER: Add UEFI 2.4 support for memory errorChen, Gong
In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-08-12Merge tag 'please-pull-mce-f-bit' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE-uncorrected-error fix from Tony Luck: "Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12Merge branch 'x86/mce' into x86/rasIngo Molnar
Pursue a single RAS/MCE topic branch on x86. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-29x86/mce: Fix mce regression from recent cleanupTony Luck
In commit 33d7885b594e169256daef652e8d3527b2298e75 x86/mce: Update MCE severity condition check We simplified the rules to recognise each classification of recoverable machine check combining the instruction and data fetch rules into a single entry based on clarifications in the June 2013 SDM that all recoverable events would be reported on the unaffected processor with MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1. Unfortunately the simplified rule has a couple of bugs. Fix them here. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-14x86: delete __cpuinit usage from all x86 filesPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-11Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "There are not too many changes this time, except two new platform thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal driver, and a couple of small fixes. Highlights: - move the ti-soc-thermal driver out of the staging tree to the thermal tree. - introduce the x86_pkg_temp_thermal driver. This driver registers CPU digital temperature package level sensor as a thermal zone. - small fixes/cleanups including removing redundant use of platform_set_drvdata() and of_match_ptr for all platform thermal drivers" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits) thermal: cpu_cooling: fix stub function thermal: ti-soc-thermal: use standard GPIO DT bindings thermal: MAINTAINERS: Add git tree path for SoC specific updates thermal: fix x86_pkg_temp_thermal.c build and Kconfig Thermal: Documentation for x86 package temperature thermal driver Thermal: CPU Package temperature thermal thermal: consider emul_temperature while computing trend thermal: ti-soc-thermal: add DT example for DRA752 chip thermal: ti-soc-thermal: add dra752 chip to device table thermal: ti-soc-thermal: add thermal data for DRA752 chips thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL thermal: ti-soc-thermal: freeze FSM while computing trend thermal: ti-soc-thermal: remove external heat while extrapolating hotspot thermal: ti-soc-thermal: update DT reference for OMAP5430 x86, mcheck, therm_throt: Process package thresholds thermal: cpu_cooling: fix 'descend' check in get_property() Thermal: spear: Remove redundant use of of_match_ptr Thermal: kirkwood: Remove redundant use of of_match_ptr Thermal: dove: Remove redundant use of of_match_ptr Thermal: armada: Remove redundant use of of_match_ptr ...
2013-07-08mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMCNaveen N. Rao
The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-03Merge tag 'please-pull-mce-therm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull thermal power-limit update from Tony Luck: "Thermal limit warnings are too scary and cause unnecessary concern" * tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: x86 thermal: Disable power limit notification interrupt by default x86 thermal: Delete power-limit-notification console messages
2013-07-02Merge branch 'x86-tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tracing updates from Ingo Molnar: "This tree adds IRQ vector tracepoints that are named after the handler and which output the vector #, based on a zero-overhead approach that relies on changing the IDT entries, by Seiji Aguchi. The new tracepoints look like this: # perf list | grep -i irq_vector irq_vectors:local_timer_entry [Tracepoint event] irq_vectors:local_timer_exit [Tracepoint event] irq_vectors:reschedule_entry [Tracepoint event] irq_vectors:reschedule_exit [Tracepoint event] irq_vectors:spurious_apic_entry [Tracepoint event] irq_vectors:spurious_apic_exit [Tracepoint event] irq_vectors:error_apic_entry [Tracepoint event] irq_vectors:error_apic_exit [Tracepoint event] [...]" * 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing: Add config option checking to the definitions of mce handlers trace,x86: Do not call local_irq_save() in load_current_idt() trace,x86: Move creation of irq tracepoints from apic.c to irq.c x86, trace: Add irq vector tracepoints x86: Rename variables for debugging x86, trace: Introduce entering/exiting_irq() tracing: Add DEFINE_EVENT_FN() macro
2013-06-28Merge tag 'please-pull-mce' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE cleanup from Tony Luck: "Changes to simplify the SDM means that we can also simplify the code for SRAR (software recoverable action required) errors." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27x86/mce: Update MCE severity condition checkChen Gong
Update some SRAR severity conditions check to make it clearer, according to latest Intel SDM Vol 3(June 2013), table 15-20. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-26Merge tag 'please-pull-mce-bitmap-comment' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE updates from Tony Luck: "Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-25mce: acpi/apei: Add comments to clarify usage of the various bitfields in ↵Naveen N. Rao
the MCA subsystem There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned' per-cpu bitmaps. Provide comments so that we all know exactly what these are used for, and why. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-20x86, trace: Add irq vector tracepointsSeiji Aguchi
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20x86, trace: Introduce entering/exiting_irq()Seiji Aguchi
When implementing tracepoints in interrupt handers, if the tracepoints are simply added in the performance sensitive path of interrupt handers, it may cause potential performance problem due to the time penalty. To solve the problem, an idea is to prepare non-trace/trace irq handers and switch their IDTs at the enabling/disabling time. So, let's introduce entering_irq()/exiting_irq() for pre/post- processing of each irq handler. A way to use them is as follows. Non-trace irq handler: smp_irq_handler() { entering_irq(); /* pre-processing of this handler */ __smp_irq_handler(); /* * common logic between non-trace and trace handlers * in a vector. */ exiting_irq(); /* post-processing of this handler */ } Trace irq_handler: smp_trace_irq_handler() { entering_irq(); /* pre-processing of this handler */ trace_irq_entry(); /* tracepoint for irq entry */ __smp_irq_handler(); /* * common logic between non-trace and trace handlers * in a vector. */ trace_irq_exit(); /* tracepoint for irq exit */ exiting_irq(); /* post-processing of this handler */ } If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks cleaner. smp_trace_irq_handler() { trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); } But it doesn't work. The problem is with irq_enter/exit() being called. They must be called before trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any tracepoints are used, as tracepoints use rcu to synchronize. As a possible alternative, we may be able to call irq_enter() first as follows if irq_enter() can nest. smp_trace_irq_hander() { irq_entry(); trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); irq_exit(); } But it doesn't work, either. If irq_enter() is nested, it may have a time penalty because it has to check if it was already called or not. The time penalty is not desired in performance sensitive paths even if it is tiny. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-14x86 thermal: Disable power limit notification interrupt by defaultFenghua Yu
The package power limit notification interrupt is primarily for system diagnosis, and should not be blindly enabled on every system by default -- particuarly since Linux does nothing in the handler except count how many times it has been called... Add a new kernel cmdline parameter "int_pln_enable" for situations where users want to oberve these events via existing system counters: $ grep TRM /proc/interrupts $ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/* https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14x86 thermal: Delete power-limit-notification console messagesFenghua Yu
Package power limits are common on some systems under some conditions -- so printing console messages when limits are reached causes unnecessary customer concern and support calls. Note that even with these console messages gone, the events can still be observed via system counters: $ grep TRM /proc/interrupts Shows total thermal interrupts, which includes both power limit notifications and thermal throttling interrupts. $ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/* Will show what caused those interrupts, core and package throttling and power limit notifications. https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-13x86, mcheck, therm_throt: Process package thresholdsSrinivas Pandruvada
Added callback registration for package threshold reports. Also added a callback to check the rate control implemented in callback or not. If there is no rate control implemented, then there is a default rate control similar to core threshold notification by delaying for CHECK_INTERVAL (5 minutes) between reports. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-06-05x86, mce: Fix "braodcast" typoMathias Krause
Fix the typo in MCJ_IRQ_BRAODCAST. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-08Merge tag 'please-pull-cmci_rediscover' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull clean up of the cmci_rediscover code to fix problems found by Dave Jones, from Tony Luck. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-02x86/mce: Rework cmci_rediscover() to play well with CPU hotplugSrivatsa S. Bhat
Dave Jones reports that offlining a CPU leads to this trace: numa_remove_cpu cpu 1 node 0: mask now 0,2-3 smpboot: CPU 1 is now offline BUG: using smp_processor_id() in preemptible [00000000] code: cpu-offline.sh/10591 caller is cmci_rediscover+0x6a/0xe0 Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2 Call Trace: [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20 [<ffffffff815ef082>] _cpu_down+0x302/0x350 [<ffffffff815ef106>] cpu_down+0x36/0x50 [<ffffffff815f1c9d>] store_online+0x8d/0xd0 [<ffffffff813edc48>] dev_attr_store+0x18/0x30 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150 [<ffffffff811adfb2>] vfs_write+0xa2/0x170 [<ffffffff811ae16c>] sys_write+0x4c/0xa0 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b However, a look at cmci_rediscover shows that it can be simplified quite a bit, apart from solving the above issue. It invokes functions that take spin locks with interrupts disabled, and hence it can run in atomic context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU is already dead and out of the cpu_online_mask. So take these points into account and simplify the code, and thereby also fix the above issue. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-03-22x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMDBoris Ostrovsky
Currently number of error reporting register banks is hardcoded to 6 on AMD processors. This may break in virtualized scenarios when a hypervisor prefers to report fewer banks than what the physical HW provides. Since number of supported banks is reported in MSR_IA32_MCG_CAP[7:0] that's what we should use. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1363295441-1859-3-git-send-email-boris.ostrovsky@oracle.com [ reverse NULL ptr test logic ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-22x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helperBoris Ostrovsky
Use helper function instead of an array to report whether register bank is shared. Currently only bank 4 (northbridge) is shared. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1363295441-1859-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-02-25Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Add option to not sign modules during modules_install MODSIGN: Add -s <signature> option to sign-file MODSIGN: Specify the hash algorithm on sign-file command line MODSIGN: Simplify Makefile with a Kconfig helper module: clean up load_module a little more. modpost: Ignore ARC specific non-alloc sections module: constify within_module_* taint: add explicit flag to show whether lock dep is still OK. module: printk message when module signature fail taints kernel.
2013-01-21taint: add explicit flag to show whether lock dep is still OK.Rusty Russell
Fix up all callers as they were before, with make one change: an unsigned module taints the kernel, but doesn't turn off lockdep. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-28x86/mce: don't use [delayed_]work_pending()Tejun Heo
There's no need to test whether a (delayed) work item in pending before queueing, flushing or cancelling it. Most uses are unnecessary and quite a few of them are buggy. Remove unnecessary pending tests from x86/mce. Only compile tested. v2: Local var work removed from mce_schedule_work() as suggested by Borislav. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac@vger.kernel.org
2012-12-14Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "Rework all config variables used throughout the MCA code and collect them together into a mca_config struct. This keeps them tightly and neatly packed together instead of spilled all over the place. Then, convert those which are used as booleans into real booleans and save some space. These bits are exposed via /sys/devices/system/machinecheck/machinecheck*/" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, MCA: Finish mca_config conversion x86, MCA: Convert the next three variables batch x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout x86, MCA: Convert dont_log_ce, banks and tolerant drivers/base: Add a DEVICE_BOOL_ATTR macro
2012-11-13Merge tag 'please-pull-tangchen' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/urgent Pull MCE fix from Tony Luck: "Fix problem in CMCI rediscovery code that was illegally migrating worker threads to other cpus." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30x86/mce: Do not change worker's running cpu in cmci_rediscover().Tang Chen
cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's running cpu, and migrate itself to the dest cpu. But worker processes are not allowed to be migrated. If current is a worker, the worker will be migrated to another cpu, but the corresponding worker_pool is still on the original cpu. In this case, the following BUG_ON in try_to_wake_up_local() will be triggered: BUG_ON(rq != this_rq()); This will cause the kernel panic. The call trace is like the following: [ 6155.451107] ------------[ cut here ]------------ [ 6155.452019] kernel BUG at kernel/sched/core.c:1654! ...... [ 6155.452019] RIP: 0010:[<ffffffff810add15>] [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130 ...... [ 6155.452019] Call Trace: [ 6155.452019] [<ffffffff8166fc14>] __schedule+0x764/0x880 [ 6155.452019] [<ffffffff81670059>] schedule+0x29/0x70 [ 6155.452019] [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0 [ 6155.452019] [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140 [ 6155.452019] [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0 [ 6155.452019] [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50 [ 6155.452019] [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190 [ 6155.452019] [<ffffffff8166fefb>] wait_for_common+0x12b/0x180 [ 6155.452019] [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0 [ 6155.452019] [<ffffffff8167002d>] wait_for_completion+0x1d/0x20 [ 6155.452019] [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0 [ 6155.452019] [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0 [ 6155.452019] [<ffffffff810a6ab8>] ? complete+0x28/0x60 [ 6155.452019] [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130 [ 6155.452019] [<ffffffff81036785>] cmci_rediscover+0xf5/0x140 [ 6155.452019] [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d [ 6155.452019] [<ffffffff81676187>] notifier_call_chain+0x67/0x150 [ 6155.452019] [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10 [ 6155.452019] [<ffffffff81070470>] __cpu_notify+0x20/0x40 [ 6155.452019] [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30 [ 6155.452019] [<ffffffff81655182>] _cpu_down+0x262/0x2e0 [ 6155.452019] [<ffffffff81655236>] cpu_down+0x36/0x50 [ 6155.452019] [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e [ 6155.452019] [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2 [ 6155.452019] [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0 [ 6155.452019] [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50 [ 6155.452019] [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d [ 6155.452019] [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee [ 6155.452019] [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b [ 6155.452019] [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34 [ 6155.452019] [<ffffffff81090589>] process_one_work+0x219/0x680 [ 6155.452019] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680 [ 6155.452019] [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23 [ 6155.452019] [<ffffffff810923be>] worker_thread+0x12e/0x320 [ 6155.452019] [<ffffffff81092290>] ? manage_workers+0x110/0x110 [ 6155.452019] [<ffffffff81098396>] kthread+0xc6/0xd0 [ 6155.452019] [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10 [ 6155.452019] [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13 [ 6155.452019] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70 [ 6155.452019] [<ffffffff8167c4c0>] ? gs_change+0x13/0x13 This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover jobs onto all the other cpus using system_wq. This could bring some delay for the jobs. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-10-30x86, AMD: Change Boris' email addressBorislav Petkov
Move to private email and put in maintained status. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1351532410-4887-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-26x86, MCA: Finish mca_config conversionBorislav Petkov
mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three bools which need conversion. Move them to the mca_config struct and adjust usage sites accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>