summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2014-12-10Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot and percpu updates from Ingo Molnar: "This tree contains a bootable images documentation update plus three slightly misplaced x86/asm percpu changes/optimizations" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Use RIP-relative addressing for most per-CPU accesses x86-64: Handle PC-relative relocations on per-CPU data x86: Convert a few more per-CPU items to read-mostly ones x86, boot: Document intermediates more clearly
2014-12-10Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Misc changes: - context switch micro-optimization - debug printout micro-optimization - comment enhancements and typo fix" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Replace seq_printf() with seq_puts() x86/asm: Fix typo in arch/x86/kernel/asm_offset_64.c sched/x86: Add a comment clarifying LDT context switching sched/x86_64: Don't save flags on context switch
2014-12-10Merge branch 'x86-mpx-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MPX support from Thomas Gleixner: "This enables support for x86 MPX. MPX is a new debug feature for bound checking in user space. It requires kernel support to handle the bound tables and decode the bound violating instruction in the trap handler" * 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: asm-generic: Remove asm-generic arch_bprm_mm_init() mm: Make arch_unmap()/bprm_mm_init() available to all architectures x86: Cleanly separate use of asm-generic/mm_hooks.h x86 mpx: Change return type of get_reg_offset() fs: Do not include mpx.h in exec.c x86, mpx: Add documentation on Intel MPX x86, mpx: Cleanup unused bound tables x86, mpx: On-demand kernel allocation of bounds tables x86, mpx: Decode MPX instruction to get bound violation information x86, mpx: Add MPX-specific mmap interface x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: Add MPX to disabled features ia64: Sync struct siginfo with general version mips: Sync struct siginfo with general version mpx: Extend siginfo structure to include bound violation information x86, mpx: Rename cfg_reg_u and status_reg x86: mpx: Give bndX registers actual names x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10Merge branch 'irq-irqdomain-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq domain updates from Thomas Gleixner: "The real interesting irq updates: - Support for hierarchical irq domains: For complex interrupt routing scenarios where more than one interrupt related chip is involved we had no proper representation in the generic interrupt infrastructure so far. That made people implement rather ugly constructs in their nested irq chip implementations. The main offenders are x86 and arm/gic. To distangle that mess we have now hierarchical irqdomains which seperate the various interrupt chips and connect them via the hierarchical domains. That keeps the domain specific details internal to the particular hierarchy level and removes the criss/cross referencing of chip internals. The resulting hierarchy for a complex x86 system will look like this: vector mapped: 74 msi-0 mapped: 2 dmar-ir-1 mapped: 69 ioapic-1 mapped: 4 ioapic-0 mapped: 20 pci-msi-2 mapped: 45 dmar-ir-0 mapped: 3 ioapic-2 mapped: 1 pci-msi-1 mapped: 2 htirq mapped: 0 Neither ioapic nor pci-msi know about the dmar interrupt remapping between themself and the vector domain. If interrupt remapping is disabled ioapic and pci-msi become direct childs of the vector domain. In hindsight we should have done that years ago, but in hindsight we always know better :) - Support for generic MSI interrupt domain handling We have more and more non PCI related MSI interrupts, so providing a generic infrastructure for this is better than having all affected architectures implementing their own private hacks. - Support for PCI-MSI interrupt domain handling, based on the generic MSI support. This part carries the pci/msi branch from Bjorn Helgaas pci tree to avoid a massive conflict. The PCI/MSI parts are acked by Bjorn. I have two more branches on top of this. The full conversion of x86 to hierarchical domains and a partial conversion of arm/gic" * 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) genirq: Move irq_chip_write_msi_msg() helper to core PCI/MSI: Allow an msi_controller to be associated to an irq domain PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain PCI/MSI: Enhance core to support hierarchy irqdomain PCI/MSI: Move cached entry functions to irq core genirq: Provide default callbacks for msi_domain_ops genirq: Introduce msi_domain_alloc/free_irqs() asm-generic: Add msi.h genirq: Add generic msi irq domain support genirq: Introduce callback irq_chip.irq_write_msi_msg genirq: Work around __irq_set_handler vs stacked domains ordering issues irqdomain: Introduce helper function irq_domain_add_hierarchy() irqdomain: Implement a method to automatically call parent domains alloc/free genirq: Introduce helper irq_domain_set_info() to reduce duplicated code genirq: Split out flow handler typedefs into seperate header file genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip genirq: Add more helper functions to support stacked irq_chip genirq: Introduce helper functions to support stacked irq_chip irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF ...
2014-12-09Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y. This instruments might_sleep() checks to catch places that nest blocking primitives - such as mutex usage in a wait loop. Such bugs can result in hard to debug races/hangs. Another category of invalid nesting that this facility will detect is the calling of blocking functions from within schedule() -> sched_submit_work() -> blk_schedule_flush_plug(). There's some potential for false positives (if secondary blocking primitives themselves are not ready yet for this facility), but the kernel will warn once about such bugs per bootup, so the warning isn't much of a nuisance. This feature comes with a number of fixes, for problems uncovered with it, so no messages are expected normally. - Another round of sched/numa optimizations and refinements, for CONFIG_NUMA_BALANCING=y. - Another round of sched/dl fixes and refinements. Plus various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched: Add missing rcu protection to wake_up_all_idle_cpus sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK sched/numa: Init numa balancing fields of init_task sched/deadline: Remove unnecessary definitions in cpudeadline.h sched/cpupri: Remove unnecessary definitions in cpupri.h sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task() sched/fair: Fix stale overloaded status in the busiest group finding logic sched: Move p->nr_cpus_allowed check to select_task_rq() sched/completion: Document when to use wait_for_completion_io_*() sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC sched/fair: Kill task_struct::numa_entry and numa_group::task_list sched: Refactor task_struct to use numa_faults instead of numa_* pointers sched/deadline: Don't check CONFIG_SMP in switched_from_dl() sched/deadline: Reschedule from switched_from_dl() after a successful pull sched/deadline: Push task away if the deadline is equal to curr during wakeup sched/deadline: Add deadline rq status print sched/deadline: Fix artificial overrun introduced by yield_task_dl() sched/rt: Clean up check_preempt_equal_prio() sched/core: Use dl_bw_of() under rcu_read_lock_sched() sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu ...
2014-12-09Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events update from Ingo Molnar: "On the kernel side there's few changes, the one that stands out is PEBS machine state sampling support on x86, by Stephane Eranian. On the tooling side: User visible tooling changes: - Don't open the DWARF info multiple times, keeping instead a dwfl handle in struct dso, greatly speeding up 'perf report' on powerpc. (Sukadev Bhattiprolu) - Introduce PARSE_OPT_DISABLED option flag and use it to avoid showing undersired options in tools that provides frontends to 'perf record', like sched, kvm, etc (Namhyung Kim) - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo Carvalho de Melo) - Fix annotation with kcore (Adrian Hunter) - Support source line numbers in annotate using a hotkey (Andi Kleen) - Callchain improvements including: * Enable printing the srcline in the history * Make get_srcline fall back to sym+offset (Andi Kleen) - TUI hist_entry browser fixes, including showing missing overhead value for first level callchain. Detected comparing the output of --stdio/--gui (that matched) with --tui, that had this problem. (Namhyung Kim) - Support handling complete branch stacks as histograms (Andi Kleen) Tooling infrastructure changes: - Prep work for supporting per-pkg and snapshot counters in 'perf stat' (Jiri Olsa) - 'perf stat' refactorings, moving stuff from it to evsel.c to use in per-pkg/snapshot format changes (Jiri Olsa) - Add per-pkg format file parsing (Matt Fleming) - Clean up libelf feature support code (Namhyung Kim) - Add gzip decompression support for kernel modules (Namhyung Kim) - More prep patches for Intel PT, including a a thread stack and more stuff made available via the database export mechanism (Adrian Hunter) - More Intel PT work, including a facility to export sample data (comms, threads, symbol names, etc) in a database friendly way, with an script to use this to create a postgresql database. (Adrian Hunter) - Make sure that thread->mg->machine points to the machine where the thread exists (it was being set only for the kmaps kernel modules case, do it as well for the mmaps) and use it to shorten function signatures (Arnaldo Carvalho de Melo) ... and lots of other fixes and smaller improvements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits) perf report: In branch stack mode use address history sorting perf report: Add --branch-history option perf callchain: Support handling complete branch stacks as histograms perf stat: Add support for snapshot counters perf stat: Add support for per-pkg counters perf tools: Remove perf_evsel__read interface perf stat: Use read_counter in read_counter_aggr perf stat: Make read_counter work over the thread dimension perf stat: Use perf_evsel__read_cb in read_counter perf tools: Add snapshot format file parsing perf tools: Add per-pkg format file parsing perf evsel: Introduce perf_evsel__read_cb function perf evsel: Introduce perf_counts_values__scale function perf evsel: Introduce perf_evsel__compute_deltas function perf tools: Allow to force redirect pr_debug to stderr. perf tools: Fix segfault due to invalid kernel dso access perf callchain: Make get_srcline fall back to sym+offset perf symbols: Move bfd_demangle stubbing to its only user perf callchain: Enable printing the srcline in the history perf tools: Collapse first level callchain entry if it has sibling ...
2014-12-09Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking tree changes from Ingo Molnar: "Two changes: a documentation update and a ticket locks live lock fix" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ticketlock: Fix spin_unlock_wait() livelock locking/lglocks: Add documentation of current lglocks implementation
2014-12-09Merge tag 'asm-generic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic asm/io.h rewrite from Arnd Bergmann: "While there normally is no reason to have a pull request for asm-generic but have all changes get merged through whichever tree needs them, I do have a series for 3.19. There are two sets of patches that change significant portions of asm/io.h, and this branch contains both in order to resolve the conflicts: - Will Deacon has done a set of patches to ensure that all architectures define {read,write}{b,w,l,q}_relaxed() functions or get them by including asm-generic/io.h. These functions are commonly used on ARM specific drivers to avoid expensive L2 cache synchronization implied by the normal {read,write}{b,w,l,q}, but we need to define them on all architectures in order to share the drivers across architectures and to enable CONFIG_COMPILE_TEST configurations for them - Thierry Reding has done an unrelated set of patches that extends the asm-generic/io.h file to the degree necessary to make it useful on ARM64 and potentially other architectures" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits) ARM64: use GENERIC_PCI_IOMAP sparc: io: remove duplicate relaxed accessors on sparc32 ARM: sa11x0: Use void __iomem * in MMIO accessors arm64: Use include/asm-generic/io.h ARM: Use include/asm-generic/io.h asm-generic/io.h: Implement generic {read,write}s*() asm-generic/io.h: Reconcile I/O accessor overrides /dev/mem: Use more consistent data types Change xlate_dev_{kmem,mem}_ptr() prototypes ARM: ixp4xx: Properly override I/O accessors ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI ARM: ebsa110: Properly override I/O accessors ARC: Remove redundant PCI_IOBASE declaration documentation: memory-barriers: clarify relaxed io accessor semantics x86: io: implement dummy relaxed accessor macros for writes tile: io: implement dummy relaxed accessor macros for writes sparc: io: implement dummy relaxed accessor macros for writes powerpc: io: implement dummy relaxed accessor macros for writes parisc: io: implement dummy relaxed accessor macros for writes mn10300: io: implement dummy relaxed accessor macros for writes ...
2014-12-08x86/ticketlock: Fix spin_unlock_wait() livelockOleg Nesterov
arch_spin_unlock_wait() looks very suboptimal, to the point I think this is just wrong and can lead to livelock: if the lock is heavily contended we can never see head == tail. But we do not need to wait for arch_spin_is_locked() == F. If it is locked we only need to wait until the current owner drops this lock. So we could simply spin until old_head != lock->tickets.head in this case, but .head can overflow and thus we can't check "unlocked" only once before the main loop. Also, the "unlocked" check can ignore TICKET_SLOWPATH_FLAG bit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/20141201213417.GA5842@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-23uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUMEAndy Lutomirski
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23x86_64, traps: Stop using IST for #SSAndy Lutomirski
On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-19x86: Cleanly separate use of asm-generic/mm_hooks.hDave Hansen
asm-generic/mm_hooks.h provides some generic fillers for the 90% of architectures that do not need to hook some mmap-manipulation functions. A comment inside says: > Define generic no-op hooks for arch_dup_mmap and > arch_exit_mmap, to be included in asm-FOO/mmu_context.h > for any arch FOO which doesn't need to hook these. So, does x86 need to hook these? It depends on CONFIG_PARAVIRT. We *conditionally* include this generic header if we have CONFIG_PARAVIRT=n. That's madness. With this patch, x86 stops using asm-generic/mmu_hooks.h entirely. We use our own copies of the functions. The paravirt code provides some stubs if it is disabled, and we always call those stubs in our x86-private versions of arch_exit_mmap() and arch_dup_mmap(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: Cleanup unused bound tablesDave Hansen
The previous patch allocates bounds tables on-demand. As noted in an earlier description, these can add up to *HUGE* amounts of memory. This has caused OOMs in practice when running tests. This patch adds support for freeing bounds tables when they are no longer in use. There are two types of mappings in play when unmapping tables: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... 2. The mapping for the bounds table *backing* the data (is tagged with VM_MPX, see the patch "add MPX specific mmap interface"). If userspace use the prctl() indroduced earlier in this patchset to enable the management of bounds tables in kernel, when it unmaps the first type of mapping with the actual data, the kernel needs to free the mapping for the bounds table backing the data. This patch hooks in at the very end of do_unmap() to do so. We look at the addresses being unmapped and find the bounds directory entries and tables which cover those addresses. If an entire table is unused, we clear associated directory entry and free the table. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space might now be allocated for some other (random) use, and the MPX hardware might now try to walk it as if it were a bounds table. That would be bad. So any unmapping of an enture bounds table has to be accompanied by a corresponding write to the bounds directory entry to invalidate it. That write to the bounds directory can fault, which causes the following problem: Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. To avoid the deadlock, we pagefault_disable() when touching the bounds directory entry and use a get_user_pages() to resolve the fault. The unmapping of bounds tables happends under vm_munmap(). We also (indirectly) call vm_munmap() to _do_ the unmapping of the bounds tables. We avoid unbounded recursion by disallowing freeing of bounds tables *for* bounds tables. This would not occur normally, so should not have any practical impact. Being strict about it here helps ensure that we do not have an exploitable stack overflow. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: On-demand kernel allocation of bounds tablesDave Hansen
This is really the meat of the MPX patch set. If there is one patch to review in the entire series, this is the one. There is a new ABI here and this kernel code also interacts with userspace memory in a relatively unusual manner. (small FAQ below). Long Description: This patch adds two prctl() commands to provide enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of this kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. The prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel won't allocate and free bounds tables even if the CPU supports MPX. PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds directory out of a userspace register (bndcfgu) and then cache it into a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to an invalid address. Using this scheme, we can use "bd_addr" to determine whether the management of bounds tables in kernel is enabled. Also, the only way to access that bndcfgu register is via an xsaves, which can be expensive. Caching "bd_addr" like this also helps reduce the cost of those xsaves when doing table cleanup at munmap() time. Unfortunately, we can not apply this optimization to #BR fault time because we need an xsave to get the value of BNDSTATUS. ==== Why does the hardware even have these Bounds Tables? ==== MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables *need* to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. ==== Why not do this in userspace? ==== This patch is obviously doing this allocation in the kernel. However, MPX does not strictly *require* anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this *could* be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are *HUGE*. An X-GB virtual area needs 4*X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual *AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: Decode MPX instruction to get bound violation informationDave Hansen
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. We have to be very careful when decoding these instructions. They are completely controlled by userspace and may be changed at any time up to and including the point where we try to copy them in to the kernel. They may or may not be MPX instructions and could be completely invalid for all we know. Note: This code is based on Qiaowei Ren's specialized MPX decoder, but uses the generic decoder whenever possible. It was tested for robustness by generating a completely random data stream and trying to decode that stream. I also unmapped random pages inside the stream to test the "partial instruction" short read code. We kzalloc() the siginfo instead of stack allocating it because we need to memset() it anyway, and doing this makes it much more clear when it got initialized by the MPX instruction decoder. Changes from the old decoder: * Use the generic decoder instead of custom functions. Saved ~70 lines of code overall. * Remove insn->addr_bytes code (never used??) * Make sure never to possibly overflow the regoff[] array, plus check the register range correctly in 32 and 64-bit modes. * Allow get_reg() to return an error and have mpx_get_addr_ref() handle when it sees errors. * Only call insn_get_*() near where we actually use the values instead if trying to call them all at once. * Handle short reads from copy_from_user() and check the actual number of read bytes against what we expect from insn_get_length(). If a read stops in the middle of an instruction, we error out. * Actually check the opcodes intead of ignoring them. * Dynamically kzalloc() siginfo_t so we don't leak any stack data. * Detect and handle decoder failures instead of ignoring them. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: Add MPX-specific mmap interfaceQiaowei Ren
We have chosen to perform the allocation of bounds tables in kernel (See the patch "on-demand kernel allocation of bounds tables") and to mark these VMAs with VM_MPX. However, there is currently no suitable interface to actually do this. Existing interfaces, like do_mmap_pgoff(), have no way to set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem long enough to let a caller do it. This patch wraps mmap_region() and hold mmap_sem long enough to make the modifications to the VMA which we need. Also note the 32/64-bit #ifdef in the header. We actually need to do this at runtime eventually. But, for now, we don't support running 32-bit binaries on 64-bit kernels. Support for this will come in later patches. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: Add MPX to disabled featuresDave Hansen
This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as both a runtime and compile-time check. When CONFIG_X86_INTEL_MPX is disabled, cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid flag will be checked at runtime. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86, mpx: Rename cfg_reg_u and status_regDave Hansen
According to Intel SDM extension, MPX configuration and status registers should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and status_reg to bndcfgu and bndstatus. [ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86: mpx: Give bndX registers actual namesDave Hansen
Consider the bndX MPX registers. There 4 registers each containing a 64-bit lower and a 64-bit upper bound. That's 8*64 bits and we declare it thusly: struct bndregs_struct { u64 bndregs[8]; } Let's say you want to read the upper bound from the MPX register bnd2 out of the xsave buf. You do: bndregno = 2; upper_bound = xsave_buf->bndregs.bndregs[2*bndregno+1]; That kinda sucks. Every time you access it, you need to know: 1. Each bndX register is two entries wide in "bndregs" 2. The lower comes first followed by upper. We do the +1 to get upper vs. lower. This replaces the old definition. You can now access them indexed by the register number directly, and with a meaningful name for the lower and upper bound: bndregno = 2; xsave_buf->bndreg[bndregno].upper_bound; It's now *VERY* clear that there are 4 registers. The programmer now doesn't have to care what order the lower and upper bounds are in, and it's harder to get it wrong. [ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct bndreg_struct to struct bndreg ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: "Yu, Fenghua" <fenghua.yu@intel.com> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18x86: Remove arbitrary instruction size limit in instruction decoderDave Hansen
The current x86 instruction decoder steps along through the instruction stream but always ensures that it never steps farther than the largest possible instruction size (MAX_INSN_SIZE). The MPX code is now going to be doing some decoding of userspace instructions. We copy those from userspace in to the kernel and they're obviously completely untrusted coming from userspace. In addition to the constraint that instructions can only be so long, we also have to be aware of how long the buffer is that came in from userspace. This _looks_ to be similar to what the perf and kprobes is doing, but it's unclear to me whether they are affected. The whole reason we need this is that it is perfectly valid to be executing an instruction within MAX_INSN_SIZE bytes of an unreadable page. We should be able to gracefully handle short reads in those cases. This adds support to the decoder to record how long the buffer being decoded is and to refuse to "validate" the instruction if we would have gone over the end of the buffer to decode it. The kprobes code probably needs to be looked at here a bit more carefully. This patch still respects the MAX_INSN_SIZE limit there but the kprobes code does look like it might be able to be a bit more strict than it currently is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: x86@kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12perf/x86/amd/ibs: Update IBS MSRs and feature definitionsAravind Gopalakrishnan
New Fam15h models carry extra feature bits and extend the MSR register space for IBS ops. Adding them here. While at it, add functionality to read IbsBrTarget and OpData4 depending on their availability if user wants a PERF_SAMPLE_RAW. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Len Brown <len.brown@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <paulus@samba.org> Cc: <acme@kernel.org> Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-11Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()"Yijing Wang
The problem fixed by 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") has been fixed in a simpler way by a previous commit ("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask Bits"). The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505a9 are no longer needed, so revert the commit. default_msi_mask_irq() and default_msix_mask_irq() were added by 0e4ccb1505a9 and are still used by s390, so keep them for now. [bhelgaas: changelog] Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: David Vrabel <david.vrabel@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: xen-devel@lists.xenproject.org
2014-11-11Merge branch 'io' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic * 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: documentation: memory-barriers: clarify relaxed io accessor semantics x86: io: implement dummy relaxed accessor macros for writes tile: io: implement dummy relaxed accessor macros for writes sparc: io: implement dummy relaxed accessor macros for writes powerpc: io: implement dummy relaxed accessor macros for writes parisc: io: implement dummy relaxed accessor macros for writes mn10300: io: implement dummy relaxed accessor macros for writes m68k: io: implement dummy relaxed accessor macros for writes m32r: io: implement dummy relaxed accessor macros for writes ia64: io: implement dummy relaxed accessor macros for writes cris: io: implement dummy relaxed accessor macros for writes frv: io: implement dummy relaxed accessor macros for writes xtensa: io: remove dummy relaxed accessor macros for reads s390: io: remove dummy relaxed accessor macros for reads microblaze: io: remove dummy relaxed accessor macros asm-generic: io: implement relaxed accessor macros as conditional wrappers Conflicts: include/asm-generic/io.h Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-11-10/dev/mem: Use more consistent data typesThierry Reding
The xlate_dev_{kmem,mem}_ptr() functions take either a physical address or a kernel virtual address, so data types should be phys_addr_t and void *. They both return a kernel virtual address which is only ever used in calls to copy_{from,to}_user(), so make variables that store it void * rather than char * for consistency. Also only define a weak unxlate_dev_mem_ptr() function if architectures haven't overridden them in the asm/io.h header file. Signed-off-by: Thierry Reding <treding@nvidia.com>
2014-11-10x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU downBoris Ostrovsky
Commit 2ed53c0d6cc9 ("x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3") introduced completions to CPU offlining process. These completions are not initialized on Xen kernels causing a panic in play_dead_common(). Move handling of die_complete into common routines to make them available to Xen guests. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: tianyu.lan@intel.com Cc: konrad.wilk@oracle.com Cc: xen-devel@lists.xenproject.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414770572-7950-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04x86-64: Use RIP-relative addressing for most per-CPU accessesJan Beulich
Observing that per-CPU data (in the SMP case) is reachable by exploiting 64-bit address wraparound (building on the default kernel load address being at 16Mb), the one byte shorter RIP-relative addressing form can be used for most per-CPU accesses. The one exception are the "stable" reads, where the use of the "P" operand modifier prevents the compiler from using RIP-relative addressing, but is unavoidable due to the use of the "p" constraint (side note: with gcc 4.9.x the intended effect of this isn't being achieved anymore, see gcc bug 63637). With the dependency on the minimum kernel load address, arbitrarily low values for CONFIG_PHYSICAL_START are now no longer possible. A link time assertion is being added, directing to the need to increase that value when it triggers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04x86: Convert a few more per-CPU items to read-mostly onesJan Beulich
Both this_cpu_off and cpu_info aren't getting modified post boot, yet are being accessed on enough code paths that grouping them with other frequently read items seems desirable. For cpu_info this at the same time implies removing the cache line alignment (which afaict became pointless when it got converted to per-CPU data years ago). Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-28sched/x86: Add a comment clarifying LDT context switchingAndy Lutomirski
The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. [ I wouldn't normally send a comment-only patch, but it took me a long time to first figure out wtf was going on here, and then to figure out why this wasn't exploitable by malicious code, and then to figure out why this oddity had no user-visible effect at all. Let's spare future readers the same confusion. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/36275c99801a87d8dcf0502a41cf4e2ad81aae46.1412623954.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28sched/x86_64: Don't save flags on context switchAndy Lutomirski
Now that the kernel always runs with clean flags (in particular, NT is clear), there is no need to save and restore flags on every context switch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Sebastian Lackner <sebastian@fds-team.de> Cc: Anish Bhatt <anish@chelsio.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/bf6fb790787eb95b922157838f52712c25dda157.1412187233.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28sched: Kill task_preempt_count()Oleg Nesterov
task_preempt_count() is pointless if preemption counter is per-cpu, currently this is x86 only. It is only valid if the task is not running, and even in this case the only info it can provide is the state of PREEMPT_ACTIVE bit. Change its single caller to check p->on_rq instead, this should be the same if p->state != TASK_RUNNING, and kill this helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28sched: stop the unbound recursion in preempt_schedule_context()Oleg Nesterov
preempt_schedule_context() does preempt_enable_notrace() at the end and this can call the same function again; exception_exit() is heavy and it is quite possible that need-resched is true again. 1. Change this code to dec preempt_count() and check need_resched() by hand. 2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid the enable/disable dance around __schedule(). But in this case we need to move into sched/core.c. 3. Cosmetic, but x86 forgets to declare this function. This doesn't really matter because it is only called by asm helpers, still it make sense to add the declaration into asm/preempt.h to match preempt_schedule(). Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vfio: fix unregister kvm_device_ops of vfio KVM: x86: Wrong assertion on paging_tmpl.h kvm: fix excessive pages un-pinning in kvm_iommu_map error path. KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag KVM: x86: Emulator does not decode clflush well KVM: emulate: avoid accessing NULL ctxt->memopp KVM: x86: Decoding guest instructions which cross page boundary may fail kvm: x86: don't kill guest on unknown exit reason kvm: vmx: handle invvpid vm exit gracefully KVM: x86: Handle errors when RIP is set during far jumps KVM: x86: Emulator fixes for eip canonical checks on near branches KVM: x86: Fix wrong masking on relative jump/call KVM: x86: Improve thread safety in pit KVM: x86: Prevent host from panicking on shared MSR writes. KVM: x86: Check non-canonical addresses upon WRMSR
2014-10-24kvm: vmx: handle invvpid vm exit gracefullyPetr Matousek
On systems with invvpid instruction support (corresponding bit in IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid causes vm exit, which is currently not handled and results in propagation of unknown exit to userspace. Fix this by installing an invvpid vm exit handler. This is CVE-2014-3646. Cc: stable@vger.kernel.org Signed-off-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24KVM: x86: Prevent host from panicking on shared MSR writes.Andy Honig
The previous patch blocked invalid writes directly when the MSR is written. As a precaution, prevent future similar mistakes by gracefulling handle GPs caused by writes to shared MSRs. Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> [Remove parts obsoleted by Nadav's patch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24KVM: x86: Check non-canonical addresses upon WRMSRNadav Amit
Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is written to certain MSRs. The behavior is "almost" identical for AMD and Intel (ignoring MSRs that are not implemented in either architecture since they would anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if non-canonical address is written on Intel but not on AMD (which ignores the top 32-bits). Accordingly, this patch injects a #GP on the MSRs which behave identically on Intel and AMD. To eliminate the differences between the architecutres, the value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to canonical value before writing instead of injecting a #GP. Some references from Intel and AMD manuals: According to Intel SDM description of WRMSR instruction #GP is expected on WRMSR "If the source register contains a non-canonical address and ECX specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE, IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP." According to AMD manual instruction manual: LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical form, a general-protection exception (#GP) occurs." IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the base field must be in canonical form or a #GP fault will occur." IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must be in canonical form." This patch fixes CVE-2014-3610. Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-23Merge branch 'x86-efi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI updates from Peter Anvin: "This patchset falls under the "maintainers that grovel" clause in the v3.18-rc1 announcement. We had intended to push it late in the merge window since we got it into the -tip tree relatively late. Many of these are relatively simple things, but there are a couple of key bits, especially Ard's and Matt's patches" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) rtc: Disable EFI rtc for x86 efi: rtc-efi: Export platform:rtc-efi as module alias efi: Delete the in_nmi() conditional runtime locking efi: Provide a non-blocking SetVariable() operation x86/efi: Adding efi_printks on memory allocationa and pci.reads x86/efi: Mark initialization code as such x86/efi: Update comment regarding required phys mapped EFI services x86/efi: Unexport add_efi_memmap variable x86/efi: Remove unused efi_call* macros efi: Resolve some shadow warnings arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format() ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format() x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format() efi: Introduce efi_md_typeattr_format() efi: Add macro for EFI_MEMORY_UCE memory attribute x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi arm64/efi: uefi_init error handling fix efi: Add kernel param efi=noruntime lib: Add a generic cmdline parse function parse_option_str ...
2014-10-20x86: io: implement dummy relaxed accessor macros for writesWill Deacon
write{b,w,l,q}_relaxed are implemented by some architectures in order to permit memory-mapped I/O accesses with weaker barrier semantics than the non-relaxed variants. This patch adds dummy macros for the write accessors to x86, in the same vein as the dummy definitions for the relaxed read accessors. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-10-19Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second (and last) round of late coming fixes and changes, almost all of them in perf tooling: User visible tooling changes: - Add period data column and make it default in 'perf script' (Jiri Olsa) - Add a visual cue for toggle zeroing of samples in 'perf top' (Taeung Song) - Improve callchains when using libunwind (Namhyung Kim) Tooling fixes and infrastructure changes: - Fix for double free in 'perf stat' when using some specific invalid command line combo (Yasser Shalabi) - Fix off-by-one bugs in map->end handling (Stephane Eranian) - Fix off-by-one bug in maps__find(), also related to map->end handling (Namhyung Kim) - Make struct symbol->end be the first addr after the symbol range, to make it match the convention used for struct map->end. (Arnaldo Carvalho de Melo) - Fix perf_evlist__add_pollfd() error handling in 'perf kvm stat live' (Jiri Olsa) - Fix python test build by moving callchain_param to an object linked into the python binding (Jiri Olsa) - Document sysfs events/ interfaces (Cody P Schafer) - Fix typos in perf/Documentation (Masanari Iida) - Add missing 'struct option' forward declaration (Arnaldo Carvalho de Melo) - Add option to copy events when queuing for sorting across cpu buffers and enable it for 'perf kvm stat live', to avoid having events left in the queue pointing to the ring buffer be rewritten in high volume sessions. (Alexander Yarygin, improving work done by David Ahern): - Do not include a struct hists per perf_evsel, untangling the histogram code from perf_evsel, to pave the way for exporting a minimalistic tools/lib/api/perf/ library usable by tools/perf and initially by the rasd daemon being developed by Borislav Petkov, Robert Richter and Jean Pihet. (Arnaldo Carvalho de Melo) - Make perf_evlist__open(evlist, NULL, NULL), i.e. without cpu and thread maps mean syswide monitoring, reducing the boilerplate for tools that only want system wide mode. (Arnaldo Carvalho de Melo) - Move exit stuff from perf_evsel__delete to perf_evsel__exit, delete should be just a front end for exit + free (Arnaldo Carvalho de Melo) - Add support to new style format of kernel PMU event. (Kan Liang) and other misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) perf script: Add period as a default output column perf script: Add period data column perf evsel: No need to drag util/cgroup.h perf evlist: Add missing 'struct option' forward declaration perf evsel: Move exit stuff from __delete to __exit kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define perf kvm stat live: Enable events copying perf session: Add option to copy events when queueing perf Documentation: Fix typos in perf/Documentation perf trace: Use thread_{,_set}_priv helpers perf kvm: Use thread_{,_set}_priv helpers perf callchain: Create an address space per thread perf report: Set callchain_param.record_mode for future use perf evlist: Fix for double free in tools/perf stat perf test: Add test case for pmu event new style format perf tools: Add support to new style format of kernel PMU event perf tools: Parse the pmu event prefix and suffix Revert "perf tools: Default to cpu// for events v5" perf Documentation: Remove Ruplicated docs for powerpc cpu specific events perf Documentation: sysfs events/ interfaces ...
2014-10-18Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS updates from Ralf Baechle: "This is the MIPS pull request for the next kernel: - Zubair's patch series adds CMA support for MIPS. Doing so it also touches ARM64 and x86. - remove the last instance of IRQF_DISABLED from arch/mips - updates to two of the MIPS defconfig files. - cleanup of how cache coherency bits are handled on MIPS and implement support for write-combining. - platform upgrades for Alchemy - move MIPS DTS files to arch/mips/boot/dts/" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (24 commits) MIPS: ralink: remove deprecated IRQF_DISABLED MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS MIPS: cpu-probe: Set the write-combine CCA value on per core basis MIPS: pgtable-bits: Define the CCA bit for WC writes on Ingenic cores MIPS: pgtable-bits: Move the CCA bits out of the core's ifdef blocks MIPS: DMA: Add cma support x86: use generic dma-contiguous.h arm64: use generic dma-contiguous.h asm-generic: Add dma-contiguous.h MIPS: BPF: Add new emit_long_instr macro MIPS: ralink: Move device-trees to arch/mips/boot/dts/ MIPS: Netlogic: Move device-trees to arch/mips/boot/dts/ MIPS: sead3: Move device-trees to arch/mips/boot/dts/ MIPS: Lantiq: Move device-trees to arch/mips/boot/dts/ MIPS: Octeon: Move device-trees to arch/mips/boot/dts/ MIPS: Add support for building device-tree binaries MIPS: Create common infrastructure for building built-in device-trees MIPS: SEAD3: Enable DEVTMPFS MIPS: SEAD3: Regenerate defconfigs MIPS: Alchemy: DB1300: Add touch penirq support ...
2014-10-17kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE defineAnton Blanchard
Commit e7dbfe349d12 ("kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c") switched from using ARCH_SUPPORTS_KPROBES_ON_FTRACE to CONFIG_KPROBES_ON_FTRACE but missed removing the define. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: masami.hiramatsu.pt@hitachi.com Cc: ananth@in.ibm.com Cc: a.p.zijlstra@chello.nl Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-15Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
2014-10-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc smaller fixes that missed the v3.17 cycle" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Add arch/x86/purgatory/ make generated files to gitignore x86: Fix section conflict for numachip x86: Reject x32 executables if x32 ABI not supported x86_64, entry: Filter RFLAGS.NT on entry from userspace x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14Merge branch 'x86-seccomp-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 seccomp changes from Ingo Molnar: "This tree includes x86 seccomp filter speedups and related preparatory work, which touches core seccomp facilities as well. The main idea is to split seccomp into two phases, to be able to enter a simple fast path for syscalls with ptrace side effects. There's no substantial user-visible (and ABI) effects expected from this, except a change in how we emit a better audit record for SECCOMP_RET_TRACE events" * 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls x86: Split syscall_trace_enter into two phases x86, entry: Only call user_exit if TIF_NOHZ x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit seccomp: Document two-phase seccomp and arch-provided seccomp_data seccomp: Allow arch code to provide seccomp_data seccomp: Refactor the filter callback and the API seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
2014-10-14Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "This tree includes the following changes: - fix memory hotplug - fix hibernation bootup memory layout assumptions - fix hyperv numa guest kernel messages - remove dead code - update documentation" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Update memory map description to list hypervisor-reserved area x86/mm, hibernate: Do not assume the first e820 area to be RAM x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() x86/mm/hotplug: Modify PGD entry when removing memory x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable() x86: Remove set_pmd_pfn
2014-10-14Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Ingo Molnar: "Misc smaller cleanups" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, intel: Fix total_size computation x86, microcode, intel: Rename apply_microcode and declare it static x86, microcode, intel: Fix typos x86, microcode, intel: Add missing static declarations x86, microcode, amd: Fix missing static declaration
2014-10-14Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "x86 FPU handling fixes, cleanups and enhancements from Oleg. The signal handling race fix and the __restore_xstate_sig() preemption fix for eager-mode is marked for -stable as well" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: copy_thread: Don't nullify ->ptrace_bps twice x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct() x86, fpu: copy_process: Sanitize fpu->last_cpu initialization x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math() x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu() x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable() x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
2014-10-14Merge branch 'x86-cpufeature-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature updates from Ingo Molnar: "This tree includes the following changes: - Introduce DISABLED_MASK to list disabled CPU features, to simplify CPU feature handling and avoid excessive #ifdefs - Remove the lightly used cpu_has_pae() primitive" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add more disabled features x86: Introduce disabled-features x86: Axe the lightly-used cpu_has_pae
2014-10-13Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Three small cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tty/serial/8250: Clean up the asm/serial.h include file a bit x86/tty/serial/8250: Resolve missing-field-initializers warnings x86: Remove obsolete comment in uapi/e820.h
2014-10-13Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side updates: - Fix and enhance poll support (Jiri Olsa) - Re-enable inheritance optimization (Jiri Olsa) - Enhance Intel memory events support (Stephane Eranian) - Refactor the Intel uncore driver to be more maintainable (Zheng Yan) - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra, Andi Kleen) - [ plus various smaller fixes/cleanups ] User visible tooling updates: - Add +field argument support for --field option, so that one can add fields to the default list of fields to show, ie now one can just do: perf report --fields +pid And the pid will appear in addition to the default fields (Jiri Olsa) - Add +field argument support for --sort option (Jiri Olsa) - Honour -w in the report tools (report, top), allowing to specify the widths for the histogram entries columns (Namhyung Kim) - Properly show submicrosecond times in 'perf kvm stat' (Christian Borntraeger) - Add beautifier for mremap flags param in 'trace' (Alex Snast) - perf script: Allow callchains if any event samples them - Don't truncate Intel style addresses in 'annotate' (Alex Converse) - Allow profiling when kptr_restrict == 1 for non root users, kernel samples will just remain unresolved (Andi Kleen) - Allow configuring default options for callchains in config file (Namhyung Kim) - Support operations for shared futexes. (Davidlohr Bueso) - "perf kvm stat report" improvements by Alexander Yarygin: - Save pid string in opts.target.pid - Enable the target.system_wide flag - Unify the title bar output - [ plus lots of other fixes and small improvements. ] Tooling infrastructure changes: - Refactor unit and scale function parameters for PMU parsing routines (Matt Fleming) - Improve DSO long names lookup with rbtree, resulting in great speedup for workloads with lots of DSOs (Waiman Long) - We were not handling POLLHUP notifications for event file descriptors Fix it by filtering entries in the events file descriptor array after poll() returns, refcounting mmaps so that when the last fd pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho de Melo) - Intel PT prep work, from Adrian Hunter, including: - Let a user specify a PMU event without any config terms - Add perf-with-kcore script - Let default config be defined for a PMU - Add perf_pmu__scan_file() - Add a 'perf test' for tracking with sched_switch - Add 'flush' callback to scripting API - Use ring buffer consume method to look like other tools (Arnaldo Carvalho de Melo) - hists browser (used in top and report) refactorings, getting rid of unused variables and reducing source code size by handling similar cases in a fewer functions (Namhyung Kim). - Replace thread unsafe strerror() with strerror_r() accross the whole tools/perf/ tree (Masami Hiramatsu) - Rename ordered_samples to ordered_events and allow setting a queue size for ordering events (Jiri Olsa) - [ plus lots of fixes, cleanups and other improvements ]" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits) perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment perf/x86/intel/uncore: Fix minor race in box set up perf record: Fix error message for --filter option not coming after tracepoint perf tools: Fix build breakage on arm64 targets perf symbols: Improve DSO long names lookup speed with rbtree perf symbols: Encapsulate dsos list head into struct dsos perf bench futex: Sanitize -q option in requeue perf bench futex: Support operations for shared futexes perf trace: Fix mmap return address truncation to 32-bit perf tools: Refactor unit and scale function parameters perf tools: Fix line number in the config file error message perf tools: Convert {record,top}.call-graph option to call-graph.record-mode perf tools: Introduce perf_callchain_config() perf callchain: Move some parser functions to callchain.c perf tools: Move callchain config from record_opts to callchain_param perf hists browser: Fix callchain print bug on TUI perf tools: Use ACCESS_ONCE() instead of volatile cast perf tools: Modify error code for when perf_session__new() fails perf tools: Fix perf record as non root with kptr_restrict == 1 perf stat: Fix --per-core on multi socket systems ...