summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-08-24x86/lguest: Remove lguest supportJuergen Gross
Lguest seems to be rather unused these days. It has seen only patches ensuring it still builds the last two years and its official state is "Odd Fixes". Remove it in order to be able to clean up the paravirt code. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: boris.ostrovsky@oracle.com Cc: lguest@lists.ozlabs.org Cc: rusty@rustcorp.com.au Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24x86/paravirt/xen: Remove xen_patch()Juergen Gross
Xen's paravirt patch function xen_patch() does some special casing for irq_ops functions to apply relocations when those functions can be patched inline instead of calls. Unfortunately none of the special case function replacements is small enough to be patched inline, so the special case never applies. As xen_patch() will call paravirt_patch_default() in all cases it can be just dropped. xen-asm.h doesn't seem necessary without xen_patch() as the only thing left in it would be the definition of XEN_EFLAGS_NMI used only once. So move that definition and remove xen-asm.h. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: lguest@lists.ozlabs.org Cc: rusty@rustcorp.com.au Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170816173157.8633-2-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-23KVM: SVM: Enable Virtual GIF featureJanakarajan Natarajan
Enable the Virtual GIF feature. This is done by setting bit 25 at position 60h in the vmcb. With this feature enabled, the processor uses bit 9 at position 60h as the virtual GIF when executing STGI/CLGI instructions. Since the execution of STGI by the L1 hypervisor does not cause a return to the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window are modified. The IRQ window will be opened even if GIF is not set, under the assumption that on resuming the L1 hypervisor the IRQ will be held pending until the processor executes the STGI instruction. For the NMI window, the STGI intercept is set. This will assist in opening the window only when GIF=1. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-23KVM: SVM: Add Virtual GIF feature definitionJanakarajan Natarajan
Add a new cpufeature definition for Virtual GIF. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-23x86/ioapic: Print the IRTE's index field correctly when enabling INTRraymond pang
When enabling interrupt remap, IOAPIC's RTE contains the interrupt_index field of IRTE. This field is composed of the ->index and the ->index2 members of 'struct IR_IO_APIC_route_entry' - but what we print out currently only uses ->index. Fix it. Signed-off-by: Raymond Pang <raymondpangxd@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: joro@8bytes.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/CAHG4imNDzpDyOVi7MByVrLQ%3DQFuOVqpzJ5F-Xs5z6OZphubj-Q@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu
Merge the crypto tree to resolve the conflict between the temporary and long-term fixes in algif_skcipher.
2017-08-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-08-21x86/CPU: Align CR3 definesBorislav Petkov
Align them vertically for better readability and use BIT_ULL() macro. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: http://lkml.kernel.org/r/20170821080651.4527-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-21x86/build: Use cc-option to validate stack alignment parameterMatthias Kaehlcke
With the following commit: 8f91869766c0 ("x86/build: Fix stack alignment for CLang") cc-option is only used to determine the name of the stack alignment option supported by the compiler, but not to verify that the actual parameter <option>=N is valid in combination with the other CFLAGS. This causes problems (as reported by the kbuild robot) with older GCC versions which only support stack alignment on a boundary of 16 bytes or higher. Also use (__)cc_option to add the stack alignment option to CFLAGS to make sure only valid options are added. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bernhard.Rosenkranzer@linaro.org Cc: Greg Hackmann <ghackmann@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Davidson <md@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hines <srhines@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dianders@chromium.org Fixes: 8f91869766c0 ("x86/build: Fix stack alignment for CLang") Link: http://lkml.kernel.org/r/20170817182047.176752-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Another pile of small fixes and updates for x86: - Plug a hole in the SMAP implementation which misses to clear AC on NMI entry - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line parameter works correctly again - Use the proper accessor in the startup64 code for next_early_pgt to prevent accessing of invalid addresses and faulting in the early boot code. - Prevent CPU hotplug lock recursion in the MTRR code - Unbreak CPU0 hotplugging - Rename overly long CPUID bits which got introduced in this cycle - Two commits which mark data 'const' and restrict the scope of data and functions to file scope by making them 'static'" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Constify attribute_group structures x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks x86: Fix norandmaps/ADDR_NO_RANDOMIZE x86/mtrr: Prevent CPU hotplug lock recursion x86: Mark various structures and functions as 'static' x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag x86/smpboot: Unbreak CPU0 hotplug x86/asm/64: Clear AC on NMI entries
2017-08-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two fixes for the perf subsystem: - Fix an inconsistency of RDPMC mm struct tagging across exec() which causes RDPMC to fault. - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which causes incorrect timestamps and total time calculations" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix time on IOC_ENABLE perf/x86: Fix RDPMC vs. mm_struct tracking
2017-08-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchdog fix from Thomas Gleixner: "A fix for the hardlockup watchdog to prevent false positives with extreme Turbo-Modes which make the perf/NMI watchdog fire faster than the hrtimer which is used to verify. Slightly larger than the minimal fix, which just would increase the hrtimer frequency, but comes with extra overhead of more watchdog timer interrupts and thread wakeups for all users. With this change we restrict the overhead to the extreme Turbo-Mode systems" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/watchdog: Prevent false positives with turbo modes
2017-08-18mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changesKees Cook
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000 broke AddressSanitizer. This is a partial revert of: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") The AddressSanitizer tool has hard-coded expectations about where executable mappings are loaded. The motivation for changing the PIE base in the above commits was to avoid the Stack-Clash CVEs that allowed executable mappings to get too close to heap and stack. This was mainly a problem on 32-bit, but the 64-bit bases were moved too, in an effort to proactively protect those systems (proofs of concept do exist that show 64-bit collisions, but other recent changes to fix stack accounting and setuid behaviors will minimize the impact). The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC base), so only the 64-bit PIE base needs to be reverted to let x86 and arm64 ASan binaries run again. Future changes to the 64-bit PIE base on these architectures can be made optional once a more dynamic method for dealing with AddressSanitizer is found. (e.g. always loading PIE into the mmap region for marked binaries.) Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Fixes: 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Kostya Serebryany <kcc@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdogNicholas Piggin
Commit 05a4a9527931 ("kernel/watchdog: split up config options") lost the perf-based hardlockup detector's dependency on PERF_EVENTS, which can result in broken builds with some powerpc configurations. Restore the dependency. Add it in for x86 too, despite x86 always selecting PERF_EVENTS it seems reasonable to make the dependency explicit. Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18KVM: VMX: always require WB memory type for EPTDavid Hildenbrand
We already always set that type but don't check if it is supported. Also for nVMX, we only support WB for now. Let's just require it. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-18KVM: VMX: cleanup EPTP definitionsDavid Hildenbrand
Don't use shifts, tag them correctly as EPTP and use better matching names (PWL vs. GAW). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-18KVM: SVM: delete avic_vm_id_bitmap (2 megabyte static array)Denys Vlasenko
With lightly tweaked defconfig: text data bss dec hex filename 11259661 5109408 2981888 19350957 12745ad vmlinux.before 11259661 5109408 884736 17253805 10745ad vmlinux.after Only compile-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-18KVM: x86: fix use of L1 MMIO areas in nested guestsPaolo Bonzini
There is currently some confusion between nested and L1 GPAs. The assignment to "direct" in kvm_mmu_page_fault tries to fix that, but it is not enough. What this patch does is fence off the MMIO cache completely when using shadow nested page tables, since we have neither a GVA nor an L1 GPA to put in the cache. This also allows some simplifications in kvm_mmu_page_fault and FNAME(page_fault). The EPT misconfig likewise does not have an L1 GPA to pass to kvm_io_bus_write, so that must be skipped for guest mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Changed comment to say "GPAs" instead of "L1's physical addresses", as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-18KVM: x86: Avoid guest page table walk when gpa_available is setBrijesh Singh
When a guest causes a page fault which requires emulation, the vcpu->arch.gpa_available flag is set to indicate that cr2 contains a valid GPA. Currently, emulator_read_write_onepage() makes use of gpa_available flag to avoid a guest page walk for a known MMIO regions. Lets not limit the gpa_available optimization to just MMIO region. The patch extends the check to avoid page walk whenever gpa_available flag is set. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> [Fix EPT=0 according to Wanpeng Li's fix, plus ensure VMX also uses the new code. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Moved "ret < 0" to the else brach, as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-18KVM: x86: simplify ept_misconfigPaolo Bonzini
Calling handle_mmio_page_fault() has been unnecessary since commit e9ee956e311d ("KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()", 2016-02-22). handle_mmio_page_fault() can now be made static. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-18kernel/watchdog: Prevent false positives with turbo modesThomas Gleixner
The hardlockup detector on x86 uses a performance counter based on unhalted CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the performance counter period, so the hrtimer should fire 2-3 times before the performance counter NMI fires. The NMI code checks whether the hrtimer fired since the last invocation. If not, it assumess a hard lockup. The calculation of those periods is based on the nominal CPU frequency. Turbo modes increase the CPU clock frequency and therefore shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x nominal frequency) the perf/NMI period is shorter than the hrtimer period which leads to false positives. A simple fix would be to shorten the hrtimer period, but that comes with the side effect of more frequent hrtimer and softlockup thread wakeups, which is not desired. Implement a low pass filter, which checks the perf/NMI period against kernel time. If the perf/NMI fires before 4/5 of the watchdog period has elapsed then the event is ignored and postponed to the next perf/NMI. That solves the problem and avoids the overhead of shorter hrtimer periods and more frequent softlockup thread wakeups. Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector") Reported-and-tested-by: Kan Liang <Kan.liang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dzickus@redhat.com Cc: prarit@redhat.com Cc: ak@linux.intel.com Cc: babu.moger@oracle.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: acme@redhat.com Cc: stable@vger.kernel.org Cc: atomlin@redhat.com Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
2017-08-18x86: Constify attribute_group structuresArvind Yadav
attribute_groups are not supposed to change at runtime and none of the groups is modified. Mark the non-const structs as const. [ tglx: Folded into one big patch ] Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com
2017-08-18Merge branch 'x86/asm' into locking/coreIngo Molnar
We need the ASM_UNREACHABLE() macro for a dependent patch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17Merge tag 'pm-4.13-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues related to exposing the current CPU frequency to user space on x86. Specifics: - Disable interrupts around reading IA32_APERF and IA32_MPERF in aperfmperf_snapshot_khz() (introduced recently) to avoid excessive delays between the reads that may result from interrupt handling (Doug Smythies). - Fix the computation of the CPU frequency to be reported through the pstate_sample tracepoint in intel_pstate (Doug Smythies)" * tag 'pm-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: x86: Disable interrupts during MSRs reading cpufreq: intel_pstate: report correct CPU frequencies during trace
2017-08-17Merge branches 'intel_pstate-fix' and 'cpufreq-x86-fix'Rafael J. Wysocki
* intel_pstate-fix: cpufreq: intel_pstate: report correct CPU frequencies during trace * cpufreq-x86-fix: cpufreq: x86: Disable interrupts during MSRs reading
2017-08-17x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical addressBaoquan He
Currently KASLR will parse all e820 entries of RAM type and add all candidate positions into the slots array. After that we choose one slot randomly as the new position which the kernel will be decompressed into and run at. On systems with EFI enabled, e820 memory regions are coming from EFI memory regions by combining adjacent regions. These EFI memory regions have various attributes, and the "mirrored" attribute is one of them. The physical memory region whose descriptors in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored. The address range mirroring feature of the kernel arranges such mirrored regions into normal zones and other regions into movable zones. With the mirroring feature enabled, the code and data of the kernel can only be located in the more reliable mirrored regions. However, the current KASLR code doesn't check EFI memory entries, and could choose a new kernel position in non-mirrored regions. This will break the intended functionality of the address range mirroring feature. To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored region to process for adding candidate of randomization slot. If EFI is disabled or no mirrored region found, still process the e820 memory map. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: keescook@chromium.org Cc: linux-efi@vger.kernel.org Cc: matt@codeblueprint.co.uk Cc: n-horiguchi@ah.jp.nec.com Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com [ Rewrote most of the text. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptorBaoquan He
The existing map iteration helper for_each_efi_memory_desc_in_map can only be used after the kernel initializes the EFI subsystem to set up struct efi_memory_map. Before that we also need iterate map descriptors which are stored in several intermediate structures, like struct efi_boot_memmap for arch independent usage and struct efi_info for x86 arch only. Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and replace several places where that primitive is open coded. Signed-off-by: Baoquan He <bhe@redhat.com> [ Various improvements to the text. ] Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: keescook@chromium.org Cc: linux-efi@vger.kernel.org Cc: n-horiguchi@ah.jp.nec.com Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17Merge branch 'linus' into x86/boot, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17locking/refcounts, x86/asm: Implement fast refcount overflow protectionKees Cook
This implements refcount_t overflow protection on x86 without a noticeable performance impact, though without the fuller checking of REFCOUNT_FULL. This is done by duplicating the existing atomic_t refcount implementation but with normally a single instruction added to detect if the refcount has gone negative (e.g. wrapped past INT_MAX or below zero). When detected, the handler saturates the refcount_t to INT_MIN / 2. With this overflow protection, the erroneous reference release that would follow a wrap back to zero is blocked from happening, avoiding the class of refcount-overflow use-after-free vulnerabilities entirely. Only the overflow case of refcounting can be perfectly protected, since it can be detected and stopped before the reference is freed and left to be abused by an attacker. There isn't a way to block early decrements, and while REFCOUNT_FULL stops increment-from-zero cases (which would be the state _after_ an early decrement and stops potential double-free conditions), this fast implementation does not, since it would require the more expensive cmpxchg loops. Since the overflow case is much more common (e.g. missing a "put" during an error path), this protection provides real-world protection. For example, the two public refcount overflow use-after-free exploits published in 2016 would have been rendered unexploitable: http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/ http://cyseclabs.com/page?n=02012016 This implementation does, however, notice an unchecked decrement to zero (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it resulted in a zero). Decrements under zero are noticed (since they will have resulted in a negative value), though this only indicates that a use-after-free may have already happened. Such notifications are likely avoidable by an attacker that has already exploited a use-after-free vulnerability, but it's better to have them reported than allow such conditions to remain universally silent. On first overflow detection, the refcount value is reset to INT_MIN / 2 (which serves as a saturation value) and a report and stack trace are produced. When operations detect only negative value results (such as changing an already saturated value), saturation still happens but no notification is performed (since the value was already saturated). On the matter of races, since the entire range beyond INT_MAX but before 0 is negative, every operation at INT_MIN / 2 will trap, leaving no overflow-only race condition. As for performance, this implementation adds a single "js" instruction to the regular execution flow of a copy of the standard atomic_t refcount operations. (The non-"and_test" refcount_dec() function, which is uncommon in regular refcount design patterns, has an additional "jz" instruction to detect reaching exactly zero.) Since this is a forward jump, it is by default the non-predicted path, which will be reinforced by dynamic branch prediction. The result is this protection having virtually no measurable change in performance over standard atomic_t operations. The error path, located in .text.unlikely, saves the refcount location and then uses UD0 to fire a refcount exception handler, which resets the refcount, handles reporting, and returns to regular execution. This keeps the changes to .text size minimal, avoiding return jumps and open-coded calls to the error reporting routine. Example assembly comparison: refcount_inc() before: .text: ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp) refcount_inc() after: .text: ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp) ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3 ... .text.unlikely: ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx ffffffff816c36d7: 0f ff (bad) These are the cycle counts comparing a loop of refcount_inc() from 1 to INT_MAX and back down to 0 (via refcount_dec_and_test()), between unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL (refcount_t-full), and this overflow-protected refcount (refcount_t-fast): 2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s: cycles protections atomic_t 82249267387 none refcount_t-fast 82211446892 overflow, untested dec-to-zero refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero This code is a modified version of the x86 PAX_REFCOUNT atomic_t overflow defense from the last public patch of PaX/grsecurity, based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Thanks to PaX Team for various suggestions for improvement for repurposing this code to be a refcount-only protection. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Hans Liljestrand <ishkamiel@gmail.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arozansk@redhat.com Cc: axboe@kernel.dk Cc: kernel-hardening@lists.openwall.com Cc: linux-arch <linux-arch@vger.kernel.org> Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pagesTony Luck
Speculative processor accesses may reference any memory that has a valid page table entry. While a speculative access won't generate a machine check, it will log the error in a machine check bank. That could cause escalation of a subsequent error since the overflow bit will be then set in the machine check bank status register. Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual address of the page we want to map out otherwise we may trigger the very problem we are trying to avoid. We use a non-canonical address that passes through the usual Linux table walking code to get to the same "pte". Thanks to Dave Hansen for reviewing several iterations of this. Also see: http://marc.info/?l=linux-mm&m=149860136413338&w=2 Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott, Robert (Persistent Memory) <elliott@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170816171803.28342-1-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17x86/build: Fix stack alignment for CLangMatthias Kaehlcke
Commit: d77698df39a5 ("x86/build: Specify stack alignment for clang") intended to use the same stack alignment for clang as with gcc. The two compilers use different options to configure the stack alignment (gcc: -mpreferred-stack-boundary=n, clang: -mstack-alignment=n). The above commit assumes that the clang option uses the same parameter type as gcc, i.e. that the alignment is specified as 2^n. However clang interprets the value of this option literally to use an alignment of n, in consequence the stack remains misaligned. Change the values used with -mstack-alignment to be the actual alignment instead of a power of two. cc-option isn't used here with the typical pattern of KBUILD_CFLAGS += $(call cc-option ...). The reason is that older gcc versions don't support the -mpreferred-stack-boundary option, since cc-option doesn't verify whether the alternative option is valid it would incorrectly select the clang option -mstack-alignment.. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bernhard.Rosenkranzer@linaro.org Cc: Greg Hackmann <ghackmann@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Davidson <md@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hines <srhines@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dianders@chromium.org Link: http://lkml.kernel.org/r/20170817004740.170588-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'Alexander Potapenko
__startup_64() is normally using fixup_pointer() to access globals in a position-independent fashion. However 'next_early_pgt' was accessed directly, which wasn't guaranteed to work. Luckily GCC was generating a R_X86_64_PC32 PC-relative relocation for 'next_early_pgt', but Clang emitted a R_X86_64_32S, which led to accessing invalid memory and rebooting the kernel. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Davidson <md@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") Link: http://lkml.kernel.org/r/20170816190808.131748-1-glider@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-16x86/nmi: Use raw lockScott Wood
register_nmi_handler() can be called from PREEMPT_RT atomic context (e.g. wakeup_cpu_via_init_nmi() or native_stop_other_cpus()), and thus ordinary spinlocks cannot be used. Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/20170724213242.27598-1-swood@redhat.com
2017-08-16x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checksOleg Nesterov
The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and randomize_stack_top() are not required. PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not set, no need to re-check after that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com
2017-08-16x86: Fix norandmaps/ADDR_NO_RANDOMIZEOleg Nesterov
Documentation/admin-guide/kernel-parameters.txt says: norandmaps Don't use address space randomization. Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space but it doesn't work because arch_rnd() which is used to randomize mm->mmap_base returns a random value unconditionally. And as Kirill pointed out, ADDR_NO_RANDOMIZE is broken by the same reason. Just shift the PF_RANDOMIZE check from arch_mmap_rnd() to arch_rnd(). Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170815153952.GA1076@redhat.com
2017-08-16x86/intel_rdt: Remove redundant ternary operator on returnColin Ian King
The use of the ternary operator is redundant as ret can never be non-zero at that point. Instead, just return nbytes. Detected by CoverityScan, CID#1452658 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170808092859.13021-1-colin.king@canonical.com
2017-08-16x86/intel_rdt/cqm: Improve limbo list processingVikas Shivappa
During a mkdir, the entire limbo list is synchronously checked on each package for free RMIDs by sending IPIs. With a large number of RMIDs (SKL has 192) this creates a intolerable amount of work in IPIs. Replace the IPI based checking of the limbo list with asynchronous worker threads on each package which periodically scan the limbo list and move the RMIDs that have: llc_occupancy < threshold_occupancy on all packages to the free list. mkdir now returns -ENOSPC if the free list and the limbo list ere empty or returns -EBUSY if there are RMIDs on the limbo list and the free list is empty. Getting rid of the IPIs also simplifies the data structures and the serialization required for handling the lists. [ tglx: Rewrote changelog ... ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502845243-20454-3-git-send-email-vikas.shivappa@linux.intel.com
2017-08-16x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplugVikas Shivappa
When a CPU is dying, the overflow worker is canceled and rescheduled on a different CPU in the same domain. But if the timer is already about to expire this essentially doubles the interval which might result in a non detected overflow. Cancel the overflow worker and reschedule it immediately on a different CPU in same domain. The work could be flushed as well, but that would reschedule it on the same CPU. [ tglx: Rewrote changelog once again ] Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502845243-20454-2-git-send-email-vikas.shivappa@linux.intel.com
2017-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-08-15x86/mtrr: Prevent CPU hotplug lock recursionThomas Gleixner
Larry reported a CPU hotplug lock recursion in the MTRR code. ============================================ WARNING: possible recursive locking detected systemd-udevd/153 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30 but task is already holding lock: (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470 .... cpus_read_lock+0x48/0x90 stop_machine+0x16/0x30 mtrr_add_page+0x18b/0x470 mtrr_add+0x3e/0x70 mtrr_add_page() holds the hotplug rwsem already and calls stop_machine() which acquires it again. Call stop_machine_cpuslocked() instead. Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de>
2017-08-15x86/xen/64: Fix the reported SS and CS in SYSCALLAndy Lutomirski
When I cleaned up the Xen SYSCALL entries, I inadvertently changed the reported segment registers. Before my patch, regs->ss was __USER(32)_DS and regs->cs was __USER(32)_CS. After the patch, they are FLAT_USER_CS/DS(32). This had a couple unfortunate effects. It confused the opportunistic fast return logic. It also significantly increased the risk of triggering a nasty glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21269 Update the Xen entry code to change it back. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Fixes: 8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries") Link: http://lkml.kernel.org/r/daba8351ea2764bb30272296ab9ce08a81bd8264.1502775273.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-15Backmerge tag 'v4.13-rc5' into drm-nextDave Airlie
Linux 4.13-rc5 There's a really nasty nouveau collision, hopefully someone can take a look once I pushed this out.
2017-08-14Merge 4.13-rc5 into char-misc-nextGreg Kroah-Hartman
We want the firmware, and other changes, in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-14Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix an error path bug in ixp4xx as well as a read overrun in sha1-avx2" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sha1 - Fix reads beyond the number of blocks passed crypto: ixp4xx - Fix error handling path in 'aead_perform()'
2017-08-14x86/intel_rdt: Modify the intel_pqr_state for better performanceVikas Shivappa
Currently we have pqr_state and rdt_default_state which store the cached CLOSID/RMIDs and the user configured cpu default values respectively. We touch both of these during context switch. Put all of them in one structure so that we can spare a cache line. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: sai.praneeth.prakhya@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502304395-7166-3-git-send-email-vikas.shivappa@linux.intel.com
2017-08-14x86/intel_rdt/cqm: Clear the default RMID during hotcpuVikas Shivappa
The user configured per cpu default RMID is not cleared during cpu hotplug. This may lead to incorrect RMID values after a cpu goes offline and again comes back online. Clear the per cpu default RMID during cpu offline and online handling. Reported-by: Prakyha Sai Praneeth <sai.praneeth.prakhya@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502304395-7166-2-git-send-email-vikas.shivappa@linux.intel.com
2017-08-13gpu: drm: tc35876x: move header file out of I2C realmWolfram Sang
include/linux/i2c is not for client devices. Move the header file to a more appropriate location. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
2017-08-12Merge tag 'for-linus-4.13b-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Some fixes for Xen: - a fix for a regression introduced in 4.13 for a Xen HVM-guest configured with KASLR - a fix for a possible deadlock in the xenbus driver when booting the system - a fix for lost interrupts in Xen guests" * tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: Fix interrupt lost during irq_disable and irq_enable xen: avoid deadlock in xenbus xen: fix hvm guest with kaslr enabled xen: split up xen_hvm_init_shared_info() x86: provide an init_mem_mapping hypervisor hook
2017-08-11kvm: x86: Disallow illegal IA32_APIC_BASE MSR valuesJim Mattson
Host-initiated writes to the IA32_APIC_BASE MSR do not have to follow local APIC state transition constraints, but the value written must be valid. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>