Age | Commit message (Collapse) | Author |
|
Page tables can bite a relatively big chunk off system memory and their
allocations are easy to trigger from userspace, so they should be
accounted to kmemcg.
This patch marks page table allocations as __GFP_ACCOUNT for x86. Note
we must not charge allocations of kernel page tables, because they can
be shared among processes from different cgroups so accounting them to a
particular one can pin other cgroups for indefinitely long. So we clear
__GFP_ACCOUNT flag if a page table is allocated for the kernel.
Link: http://lkml.kernel.org/r/7d5c54f6a2bcbe76f03171689440003d87e6c742.1464079538.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in
copy_*_user() and __copy_*_user() because copy_*_user() actually calls
down to _copy_*_user() and not __copy_*_user().
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
|
|
This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.
This is based on code from PaX.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree contains tooling fixes plus some additions:
- fixes to the vdso2c build environment that Stephen Rothwell is
using for the linux-next build (Arnaldo Carvalho de Melo)
- AVX-512 instruction mappings (Adrian Hunter)
- misc fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "perf tools: event.h needs asm/perf_regs.h"
x86: Make the vdso2c compiler use the host architecture headers
tools build: Fix objtool build with ARCH=x86_64
objtool: Always use host headers
objtool: Use tools/scripts/Makefile.arch to get ARCH and HOSTARCH
tools build: Add HOSTARCH Makefile variable
perf tests kmod-path: Fix build on ubuntu:16.04-x-armhf
perf tools: Add AVX-512 instructions to the new instructions test
perf tools: Add AVX-512 support to the instruction decoder used by Intel PT
x86/insn: Add AVX-512 support to the instruction decoder
x86/insn: perf tools: Fix vcvtph2ps instruction decoding
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Ingo Molnar:
"The main change in this tree is the reworking, fixing and extension of
the TSC frequency enumeration code (by Len Brown)"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Remove the unused check_tsc_disabled()
x86/tsc: Enumerate BXT tsc_khz via CPUID
x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
x86/tsc_msr: Add Airmont reference clock values
x86/tsc_msr: Correct Silvermont reference clock values
x86/tsc_msr: Update comments, expand definitions
x86/tsc_msr: Remove debugging messages
x86/tsc_msr: Identify Intel-specific code
Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle were:
- Intel-SoC enhancements (Andy Shevchenko)
- Intel CPU symbolic model definition rework (Dave Hansen)
- ... other misc changes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/sfi: Enable enumeration of SD devices
x86/pci: Use MRFLD abbreviation for Merrifield
x86/platform/intel-mid: Make vertical indentation consistent
x86/platform/intel-mid: Mark regulators explicitly defined
x86/platform/intel-mid: Rename mrfl.c to mrfld.c
x86/platform/intel-mid: Enable spidev on Intel Edison boards
x86/platform/intel-mid: Extend PWRMU to support Penwell
x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
x86/platform/intel-mid: Add pinctrl for Intel Merrifield
x86/platform/intel-mid: Enable GPIO expanders on Edison
x86/platform/intel-mid: Add Power Management Unit driver
x86/platform/atom/punit: Enable support for Merrifield
x86/platform/intel_mid_pci: Rework IRQ0 workaround
x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
x86, mmc: Use Intel family name macros for mmc driver
x86/intel_telemetry: Use Intel family name macros for telemetry driver
x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
x86/platform: Use new Intel model number macros
x86/intel_idle: Use Intel family macros for intel_idle
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
"The main x86 FPU changes in this cycle were:
- a large series of cleanups, fixes and enhancements to re-enable the
XSAVES instruction on Intel CPUs - which is the most advanced
instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)
- Add FPU tracepoints for the FPU state machine (Dave Hansen)"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Do not BUG_ON() in early FPU code
x86/fpu/xstate: Re-enable XSAVES
x86/fpu/xstate: Fix fpstate_init() for XRSTORS
x86/fpu/xstate: Return NULL for disabled xstate component address
x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
x86/fpu/xstate: Fix XSTATE component offset print out
x86/fpu/xstate: Fix PTRACE frames for XSAVES
x86/fpu/xstate: Fix supervisor xstate component offset
x86/fpu/xstate: Align xstate components according to CPUID
x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
x86/fpu: Add tracepoints to dump FPU state at key points
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 stackdump update from Ingo Molnar:
"A number of stackdump enhancements"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Add show_stack_regs() and use it
printk: Make the printk*once() variants return a value
x86/dumpstack: Honor supplied @regs arg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
"A build system fix and a cleanup"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kbuild: Remove stale asm-generic wrappers
kbuild, x86: Track generated headers with generated-y
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
"The main changes:
- add initial commits to randomize kernel memory section virtual
addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
(Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)
- enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
Cook)
- EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)
- misc cleanups/fixes"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify EBDA-vs-BIOS reservation logic
x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
x86/boot: Reorganize and clean up the BIOS area reservation code
x86/mm: Do not reference phys addr beyond kernel
x86/mm: Add memory hotplug support for KASLR memory randomization
x86/mm: Enable KASLR for vmalloc memory regions
x86/mm: Enable KASLR for physical mapping memory regions
x86/mm: Implement ASLR for kernel memory regions
x86/mm: Separate variable for trampoline PGD
x86/mm: Add PUD VA support for physical mapping
x86/mm: Update physical mapping variable names
x86/mm: Refactor KASLR entropy functions
x86/KASLR: Fix boot crash with certain memory configurations
x86/boot/64: Add forgotten end of function marker
x86/KASLR: Allow randomization below the load address
x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
x86/KASLR: Randomize virtual address separately
x86/KASLR: Clarify identity map interface
x86/boot: Refuse to build with data relocations
x86/KASLR, x86/power: Remove x86 hibernation restrictions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:
- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)
- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)
- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)
- MPX enhancements (Dave Hansen)
- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)
- hweight and entry code cleanups (Borislav Petkov)
- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)
- syscall entry code optimizations (Paolo Bonzini)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic updates from Ingo Molnar:
"Misc cleanups and a small fix"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Remove the unused struct apic::apic_id_mask field
x86/apic: Fix misspelled APIC
x86/ioapic: Simplify ioapic_setup_resources()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"With over 300 commits it's been a busy cycle - with most of the work
concentrated on the tooling side (as it should).
The main kernel side enhancements were:
- Add per event callchain limit: Recently we introduced a sysctl to
tune the max-stack for all events for which callchains were
requested:
$ sysctl kernel.perf_event_max_stack
kernel.perf_event_max_stack = 127
Now this patch introduces a way to configure this per event, i.e.
this becomes possible:
$ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a
allowing finer tuning of how much buffer space callchains use.
This uses an u16 from the reserved space at the end, leaving
another u16 for future use.
There has been interest in even finer tuning, namely to control the
max stack for kernel and userspace callchains separately. Further
discussion is needed, we may for instance use the remaining u16 for
that and when it is present, assume that the sample_max_stack
introduced in this patch applies for the kernel, and the u16 left
is used for limiting the userspace callchain (Arnaldo Carvalho de
Melo)
- Optimize AUX event (hardware assisted side-band event) delivery
(Kan Liang)
- Rework Intel family name macro usage (this is partially x86 arch
work) (Dave Hansen)
- Refine and fix Intel LBR support (David Carrillo-Cisneros)
- Add support for Intel 'TopDown' events (Andi Kleen)
- Intel uncore PMU driver fixes and enhancements (Kan Liang)
- ... other misc changes.
Here's an incomplete list of the tooling enhancements (but there's
much more, see the shortlog and the git log for details):
- Support cross unwinding, i.e. collecting '--call-graph dwarf'
perf.data files in one machine and then doing analysis in another
machine of a different hardware architecture. This enables, for
instance, to do:
$ perf record -a --call-graph dwarf
on a x86-32 or aarch64 system and then do 'perf report' on it on a
x86_64 workstation (He Kuang)
- Allow reading from a backward ring buffer (one setup via
sys_perf_event_open() with perf_event_attr.write_backward = 1)
(Wang Nan)
- Finish merging initial SDT (Statically Defined Traces) support, see
cset comments for details about how it all works (Masami Hiramatsu)
- Support attaching eBPF programs to tracepoints (Wang Nan)
- Add demangling of symbols in programs written in the Rust language
(David Tolnay)
- Add support for tracepoints in the python binding, including an
example, that sets up and parses sched:sched_switch events,
tools/perf/python/tracepoint.py (Jiri Olsa)
- Introduce --stdio-color to set up the color output mode selection
in 'annotate' and 'report', allowing emit color escape sequences
when redirecting the output of these tools (Arnaldo Carvalho de
Melo)
- Add 'callindent' option to 'perf script -F', to indent the Intel PT
call stack, making this output more ftrace-like (Adrian Hunter,
Andi Kleen)
- Allow dumping the object files generated by llvm when processing
eBPF scriptlet events (Wang Nan)
- Add stackcollapse.py script to help generating flame graphs (Paolo
Bonzini)
- Add --ldlat option to 'perf mem' to specify load latency for loads
event (e.g. cpu/mem-loads/ ) (Jiri Olsa)
- Tooling support for Intel TopDown counters, recently added to the
kernel (Andi Kleen)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
perf tests: Add is_printable_array test
perf tools: Make is_printable_array global
perf script python: Fix string vs byte array resolving
perf probe: Warn unmatched function filter correctly
perf cpu_map: Add more helpers
perf stat: Balance opening and reading events
tools: Copy linux/{hash,poison}.h and check for drift
perf tools: Remove include/linux/list.h from perf's MANIFEST
tools: Copy the bitops files accessed from the kernel and check for drift
Remove: kernel unistd*h files from perf's MANIFEST, not used
perf tools: Remove tools/perf/util/include/linux/const.h
perf tools: Remove tools/perf/util/include/asm/byteorder.h
perf tools: Add missing linux/compiler.h include to perf-sys.h
perf jit: Remove some no-op error handling
perf jit: Add missing curly braces
objtool: Initialize variable to silence old compiler
objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
perf record: Add --tail-synthesize option
perf session: Don't warn about out of order event if write_backward is used
perf tools: Enable overwrite settings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The biggest change in this cycle were SGI/UV related changes that
clean up and fix UV boot quirks and problems.
There's also various smaller cleanups and refinements"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Reorganize the GUID table to make it easier to read
x86/efi: Remove the unused efi_get_time() function
x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
efi: Convert efi_call_virt() to efi_call_virt_pointer()
x86/efi: Remove unused variable 'efi'
efi: Document #define FOO_PROTOCOL_GUID layout
efibc: Report more information in the error messages
|
|
Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
|
Update cpuid.h header from xen hypervisor tree to get
XEN_HVM_CPUID_VCPU_ID_PRESENT definition.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
|
* pm-cpu:
x86: remove duplicate turbo ratio limit MSRs
tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT
cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
|
|
|
|
* pm-sleep:
PM / hibernate: Introduce test_resume mode for hibernation
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
PM / hibernate: Recycle safe pages after image restoration
PM / hibernate: Simplify mark_unsafe_pages()
PM / hibernate: Do not free preallocated safe pages during image restore
PM / suspend: show workqueue state in suspend flow
PM / sleep: make PM notifiers called symmetrically
PM / sleep: Make pm_prepare_console() return void
PM / Hibernate: Don't let kasan instrument snapshot.c
* pm-tools:
PM / tools: scripts: AnalyzeSuspend v4.2
tools/turbostat: allow user to alter DESTDIR and PREFIX
|
|
* acpi-tables:
ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
ACPI: add support for loading SSDTs via configfs
ACPI: add support for configfs
efi / ACPI: load SSTDs from EFI variables
spi / ACPI: add support for ACPI reconfigure notifications
i2c / ACPI: add support for ACPI reconfigure notifications
ACPI: add support for ACPI reconfiguration notifiers
ACPI / scan: fix enumeration (visited) flags for bus rescans
ACPI / documentation: add SSDT overlays documentation
ACPI: ARM64: support for ACPI_TABLE_UPGRADE
ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE
ACPI / tables: move arch-specific symbol to asm/acpi.h
ACPI / tables: table upgrade: refactor function definitions
ACPI / tables: table upgrade: use cacheable map for tables
Conflicts:
arch/arm64/include/asm/acpi.h
|
|
* acpi-numa:
ACPI / NUMA: Enable ACPI based NUMA on ARM64
arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT
ACPI / processor: Add acpi_map_madt_entry()
ACPI / NUMA: Improve SRAT error detection and add messages
ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c
ACPI / NUMA: remove unneeded acpi_numa=1
ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c
x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init()
arm64, NUMA: Cleanup NUMA disabled messages
arm64, NUMA: rework numa_add_memblk()
ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c
ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only
ACPI / NUMA: remove duplicate NULL check
ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug()
ACPI / NUMA: Use pr_fmt() instead of printk
|
|
|
|
The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.
Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM. Remove
their usage from KVM.
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
It doesn't just control probing for the EBDA -- it controls whether we
detect and reserve the <1MB BIOS regions in general.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add support for Intel's AVX-512 instructions to the instruction decoder.
AVX-512 instructions are documented in Intel Architecture Instruction
Set Extensions Programming Reference (February 2016).
AVX-512 instructions are identified by a EVEX prefix which, for the
purpose of instruction decoding, can be treated as though it were a
4-byte VEX prefix.
Existing instructions which can now accept an EVEX prefix need not be
further annotated in the op code map (x86-opcode-map.txt). In the case
of new instructions, the op code map is updated accordingly.
Also add associated Mask Instructions that are used to manipulate mask
registers used in AVX-512 instructions.
The 'perf tools' instruction decoder is updated in a subsequent patch.
And a representative set of instructions is added to the perf tools new
instructions test in a subsequent patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So the reserve_ebda_region() code has accumulated a number of
problems over the years that make it really difficult to read
and understand:
- The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...
- 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
super confusing to read.
- It does not help at all that we have various memory range markers, half of which
are 'start of range', half of which are 'end of range' - but this crucial
property is not obvious in the naming at all ... gave me a headache trying to
understand all this.
- Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
obvious, all values here are addresses!), while it does not highlight that it's
the _start_ of the EBDA region ...
- 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
that is a pointer to a value, not a memory range address!
- The function name itself is a misnomer: it says 'reserve_ebda_region()' while
its main purpose is to reserve all the firmware ROM typically between 640K and
1MB, while the 'EBDA' part is only a small part of that ...
- Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
too should be about whether to reserve firmware areas in the paravirt case.
- In fact thinking about this as 'end of RAM' is confusing: what this function
*really* wants to reserve is firmware data and code areas! Once the thinking is
inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
'reserved area' notion everything becomes a lot clearer.
To improve all this rewrite the whole code (without changing the logic):
- Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
and propagate this concept through all the variable names and constants.
BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES
BIOS_START_MIN // was: INSANE_CUTOFF
ebda_start // was: ebda_addr
bios_start // was: lowmem
BIOS_START_MAX // was: LOWMEM_CAP
- Then clean up the name of the function itself by renaming it
to reserve_bios_regions() and renaming the ::ebda_search paravirt
flag to ::reserve_bios_regions.
- Fix up all the comments (fix typos), harmonize and simplify their
formulation and remove comments that become unnecessary due to
the much better naming all around.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Monitored cached line may not wake up from mwait on certain
Goldmont based CPUs. This patch will avoid calling
current_set_polling_and_test() and thereby not set the TIF_ flag.
The result is that we'll always send IPIs for wakeups.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468867270-18493-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On Intel hardware, native_play_dead() uses mwait_play_dead() by
default and only falls back to the other methods if that fails.
That also happens during resume from hibernation, when the restore
(boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
except for the boot one offline.
However, that is problematic, because the address passed to
__monitor() in mwait_play_dead() is likely to be written to in the
last phase of hibernate image restoration and that causes the "dead"
CPU to start executing instructions again. Unfortunately, the page
containing the address in that CPU's instruction pointer may not be
valid any more at that point.
First, that page may have been overwritten with image kernel memory
contents already, so the instructions the CPU attempts to execute may
simply be invalid. Second, the page tables previously used by that
CPU may have been overwritten by image kernel memory contents, so the
address in its instruction pointer is impossible to resolve then.
A report from Varun Koyyalagunta and investigation carried out by
Chen Yu show that the latter sometimes happens in practice.
To prevent it from happening, temporarily change the smp_ops.play_dead
pointer during resume from hibernation so that it points to a special
"play dead" routine which uses hlt_play_dead() and avoids the
inadvertent "revivals" of "dead" CPUs this way.
A slightly unpleasant consequence of this change is that if the
system is hibernated with one or more CPUs offline, it will generally
draw more power after resume than it did before hibernation, because
the physical state entered by CPUs via hlt_play_dead() is higher-power
than the mwait_play_dead() one in the majority of cases. It is
possible to work around this, but it is unclear how much of a problem
that's going to be in practice, so the workaround will be implemented
later if it turns out to be necessary.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-by: Varun Koyyalagunta <cpudebug@centtech.com>
Original-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
|
|
The only user verify_local_APIC() had been removed by commit:
4399c03c6780 ("x86/apic: Remove verify_local_APIC()")
... so there is no need to keep it.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bsd@redhat.com
Cc: david.vrabel@citrix.com
Cc: jgross@suse.com
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1468463046-20849-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
check_tsc_disabled() was introduced by commit:
c73deb6aecda ("perf/x86: Add ability to calculate TSC from perf sample timestamps")
The only caller was arch_perf_update_userpage(), which had been refactored
by commit:
d8b11a0cbd1c ("perf/x86: Clean up cap_user_time* setting")
... so no need keep and export it any more.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: adrian.hunter@intel.com
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1468570330-25810-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Don't use the same syscall numbers for 2 different syscalls:
534 x32 preadv compat_sys_preadv64
535 x32 pwritev compat_sys_pwritev64
534 x32 preadv2 compat_sys_preadv2
535 x32 pwritev2 compat_sys_pwritev2
Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It serves no purpose -- raw_smp_processor_id() works fine. This
change will be needed to move thread_info off the stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
struct thread_info is a legacy mess. To prepare for its partial removal,
move thread_info::addr_limit out.
As an added benefit, this way is simpler.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Rename it to match the thread_struct::uaccess_err pattern and also
because it was too long.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
thread_info::sig_on_uaccess_err to thread_struct
struct thread_info is a legacy mess. To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.
Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions. This leaves a couple of
other helpers unused, so delete them, too.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
tools/testing/selftests/x86/Makefile
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. In the case of
some of these which are modular, we can extend that to also include
files that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h
The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.
In the case of crypto/glue_helper.c we delete a redundant instance
of MODULE_LICENSE in order to delete module.h -- the license info
is already present at the top of the file.
The uncore change warrants a mention too; it is uncore.c that uses
module.h and not uncore.h; hence the relocation done there.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed. Build testing
revealed some implicit header usage that was fixed up accordingly.
Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and
rcu had to be modified to cope with it.
The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower
value was chosen in case there were bugs. 1023 is sufficient maximum
APIC ID for 288 VCPUs.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
288 is in high demand because of Knights Landing CPU.
We cannot set the limit to 640k, because that would be wasting space.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
KVM_CAP_X2APIC_API.
The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
The enableable capability is needed in order to support standard x2APIC and
remain backward compatible.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Expand kvm_apic_mda comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM_CAP_X2APIC_API is a capability for features related to x2APIC
enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
Both are needed to support x2APIC.
The feature has to be enableable and disabled by default, because
get/set ioctl shifted and truncated APIC ID to 8 bits by using a
non-standard protocol inspired by xAPIC and the change is not
backward-compatible.
Changes to MSI addresses follow the format used by interrupt remapping
unit. The upper address word, that used to be 0, contains upper 24 bits
of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as
0. Using the upper address word is not backward-compatible either as we
didn't check that userspace zeroed the word. Reserved bits are still
not explicitly checked, but non-zero data will affect LAPIC addresses,
which will cause a bug.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
probably has fewer VCPUs. Dynamic size saves memory at the cost of
turning one constant into a variable.
apic_map mutex had to be moved before allocation to avoid races with cpu
hotplug.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can
reuse the physical array for them. This allows us to save space by
sizing the logical maps according to the needs of xAPIC.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
240 has been well tested by Red Hat.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|