summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/elf.h
AgeCommit message (Collapse)Author
2017-07-10binfmt_elf: use ELF_ET_DYN_BASE only for PIEKees Cook
The ELF_ET_DYN_BASE position was originally intended to keep loaders away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2 /bin/cat" might cause the subsequent load of /bin/cat into where the loader had been loaded.) With the advent of PIE (ET_DYN binaries with an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial portion of the address space is unused. For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are loaded above the mmap region. This means they can be made to collide (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap region in all cases, and will now additionally avoid programs falling back to the mmap region by enforcing MAP_FIXED for program loads (i.e. if it would have collided with the stack, now it will fail to load instead of falling back to the mmap region). To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP) are loaded into the mmap region, leaving space available for either an ET_EXEC binary with a fixed location or PIE being loaded into mmap by the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which means architectures can now safely lower their values without risk of loaders colliding with their subsequently loaded programs. For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and suggestions on how to implement this solution. Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-01Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...
2017-04-10x86/vdso: Plug race between mapping and ELF header setupThomas Gleixner
The vsyscall32 sysctl can racy against a concurrent fork when it switches from disabled to enabled: arch_setup_additional_pages() if (vdso32_enabled) --> No mapping sysctl.vsysscall32() --> vdso32_enabled = true create_elf_tables() ARCH_DLINFO_IA32 if (vdso32_enabled) { --> Add VDSO entry with NULL pointer Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for the newly forked process or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathias Krause <minipli@googlemail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-14x86/hugetlb: Adjust to the new native/compat mmap basesDmitry Safonov
Commit 1b028f784e8c introduced two mmap() bases for 32-bit syscalls and for 64-bit syscalls. The mmap() code in x86 was modified to handle the separation, but the patch series missed to update the hugetlb code. As a consequence a 32bit application mapping a file on hugetlbfs uses the 64-bit mmap base for address space allocation, which fails. Adjust the hugetlb mapping code to use the proper bases depending on the syscall invocation mode (64-bit or compat). [ tglx: Massaged changelog and switched from asm/compat.h to linux/compat.h ] Fixes: commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170314114126.9280-1-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13x86/mm: Introduce mmap_compat_base() for 32-bit mmap()Dmitry Safonov
mmap() uses a base address, from which it starts to look for a free space for allocation. The base address is stored in mm->mmap_base, which is calculated during exec(). The address depends on task's size, set rlimit for stack, ASLR randomization. The base depends on the task size and the number of random bits which are different for 64-bit and 32bit applications. Due to the fact, that the base address is fixed, its mmap() from a compat (32bit) syscall issued by a 64bit task will return a address which is based on the 64bit base address and does not fit into the 32bit address space (4GB). The returned pointer is truncated to 32bit, which results in an invalid address. To solve store a seperate compat address base plus a compat legacy address base in mm_struct. These bases are calculated at exec() time and can be used later to address the 32bit compat mmap() issued by 64 bit applications. As a consequence of this change 32-bit applications issuing a 64-bit syscall (after doing a long jump) will get a 64-bit mapping now. Before this change 32-bit applications always got a 32bit mapping. [ tglx: Massaged changelog and added a comment ] Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170306141721.9188-4-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13x86/mm: Add task_size parameter to mmap_base()Dmitry Safonov
To correctly handle 32-bit and 64-bit mmap() syscalls in 64bit applications its required to have separate address bases to place a mapping. The tasksize can be used as an indicator to select the proper parameters for mmap_base(). This requires the following changes: - Add task_size argument to mmap_base() and make the calculation based on it. - Provide mmap_legacy_base() as a seperate function - Use the new functions in arch_pick_mmap_layout() [ tglx: Massaged changelog ] Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170306141721.9188-3-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAITGrzegorz Andrejczuk
Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the ring 3 MONITOR/MWAIT. HWCAP variables contain bitmasks which can be used by userspace applications to detect which instruction sets are supported by CPU. On x86 architecture information about CPU capabilities can be checked via CPUID instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is set to CPUID[1].EDX which means that all bits are reserved there. HWCAP2 approach was chosen because it reuses existing solution present in other architectures, so only minor modifications are required to the kernel and userspace applications. When ELF_HWCAP2 is defined kernel maps it to AT_HWCAP2 during the start of the application. This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval() API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent with x86 ELF_HWCAP type. Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Piotr.Luc@intel.com Cc: dave.hansen@linux.intel.com Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-04tree-wide: replace config_enabled() with IS_ENABLED()Masahiro Yamada
The use of config_enabled() against config options is ambiguous. In practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the author might have used it for the meaning of IS_ENABLED(). Using IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention clearer. This commit replaces config_enabled() with IS_ENABLED() where possible. This commit is only touching bool config options. I noticed two cases where config_enabled() is used against a tristate option: - config_enabled(CONFIG_HWMON) [ drivers/net/wireless/ath/ath10k/thermal.c ] - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE) [ drivers/gpu/drm/gma500/opregion.c ] I did not touch them because they should be converted to IS_BUILTIN() in order to keep the logic, but I was not sure it was the authors' intention. Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Joshua Kinard <kumba@gentoo.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will Drewry <wad@chromium.org> Cc: Nikolay Martynov <mar.kolya@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: Rafal Milecki <zajec5@gmail.com> Cc: James Cowgill <James.Cowgill@imgtec.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alex Smith <alex.smith@imgtec.com> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Tony Wu <tung7970@gmail.com> Cc: Huaitong Han <huaitong.han@intel.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Gelmini <andrea.gelmini@gelma.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rabin Vincent <rabin@rab.in> Cc: "Maciej W. Rozycki" <macro@imgtec.com> Cc: David Daney <david.daney@cavium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-29x86/asm/64: Rename thread_struct's fs and gs to fsbase and gsbaseAndy Lutomirski
Unlike ds and es, these are base addresses, not selectors. Rename them so their meaning is more obvious. On x86_32, the field is still called fs. Fixing that could make sense as a future cleanup. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/69a18a51c4cba0ce29a241e570fc618ad721d908.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-01x86/cpufeature: Use enum cpuid_leafs instead of magic numbersHuaitong Han
Most of the magic numbers in x86_capability[] have been converted to 'enum cpuid_leafs', and this patch updates the remaining part. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1453750913-4781-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/elf/64: Clear more registers in elf_common_init()Andy Lutomirski
Before we start calling execve in contexts that honor the full pt_regs, we need to teach it to initialize all registers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/65a38a9edee61a1158cfd230800c61dbd963dac5.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07x86/vdso: Remove runtime 32-bit vDSO selectionAndy Lutomirski
32-bit userspace will now always see the same vDSO, which is exactly what used to be the int80 vDSO. Subsequent patches will clean it up and make it support SYSENTER and SYSCALL using alternatives. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/compat: Don't build the 32-bit VDSO if not neededBrian Gerst
Build the 32-bit vdso only for native 32-bit or 32-bit compat is enabled. x32 should not force it to build. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-7-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/compat: Rename 'start_thread_ia32' to 'compat_start_thread'Brian Gerst
This function is shared between the 32-bit compat and x32 ABIs. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-5-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/compat: Make mmap_is_ia32() common compatBrian Gerst
TIF_ADDR32 is set for both ia32 and x32 tasks, so change from CONFIG_IA32_EMULATION to CONFIG_COMPAT. Use config_enabled() to make the function more readable. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-14mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZEKees Cook
The arch_randomize_brk() function is used on several architectures, even those that don't support ET_DYN ASLR. To avoid bulky extern/#define tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for the architectures that support it, while still handling CONFIG_COMPAT_BRK. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-13Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main changes in this cycle were: - reduce the x86/32 PAE per task PGD allocation overhead from 4K to 0.032k (Fenghua Yu) - early_ioremap/memunmap() usage cleanups (Juergen Gross) - gbpages support cleanups (Luis R Rodriguez) - improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to increase randomization by 3 bits (per bootup) (Hector Marco-Gisbert) - misc fixlets" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Improve AMD Bulldozer ASLR workaround x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion init.h: Clean up the __setup()/early_param() macros x86/mm: Simplify probe_page_size_mask() x86/mm: Further simplify 1 GB kernel linear mappings handling x86/mm: Use early_param_on_off() for direct_gbpages init.h: Add early_param_on_off() x86/mm: Simplify enabling direct_gbpages x86/mm: Use IS_ENABLED() for direct_gbpages x86/mm: Unexport set_memory_ro() and set_memory_rw() x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c x86/mm: Use early_memunmap() instead of early_iounmap() x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes
2015-04-06x86/asm/entry: Clear EXTRA_REGS for all executable formatsDenys Vlasenko
On failure, sys_execve() does not clobber EXTRA_REGS, so we can just return to userpsace without saving/restoring them. On success, ELF_PLAT_INIT() in sys_execve() clears all these registers. On other executable formats: - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone else except sh) doesn't define it. - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most others) doesn't define it. - There are no such hooks in binfmt_aout.c et al. We inherit EXTRA_REGS from the prior executable. This inconsistency was not intended. This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve, removes register clearing in ELF_PLAT_INIT(), and instead simply clears them on success in stub_execve. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31x86/mm: Improve AMD Bulldozer ASLR workaroundHector Marco-Gisbert
The ASLR implementation needs to special-case AMD F15h processors by clearing out bits [14:12] of the virtual address in order to avoid I$ cross invalidations and thus performance penalty for certain workloads. For details, see: dfb09f9b7ab0 ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") This special case reduces the mmapped file's entropy by 3 bits. The following output is the run on an AMD Opteron 62xx class CPU processor under x86_64 Linux 4.0.0: $ for i in `seq 1 10`; do cat /proc/self/maps | grep "r-xp.*libc" ; done b7588000-b7736000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b7570000-b771e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b75d0000-b777e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b75b0000-b775e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b7578000-b7726000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 ... Bits [12:14] are always 0, i.e. the address always ends in 0x8000 or 0x0000. 32-bit systems, as in the example above, are especially sensitive to this issue because 32-bit randomness for VA space is 8 bits (see mmap_rnd()). With the Bulldozer special case, this diminishes to only 32 different slots of mmap virtual addresses. This patch randomizes per boot the three affected bits rather than setting them to zero. Since all the shared pages have the same value at bits [12..14], there is no cache aliasing problems. This value gets generated during system boot and it is thus not known to a potential remote attacker. Therefore, the impact from the Bulldozer workaround gets diminished and ASLR randomness increased. More details at: http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html Original white paper by AMD dealing with the issue: http://developer.amd.com/wordpress/media/2012/10/SharedL1InstructionCacheonAMD15hCPU.pdf Mentored-by: Ismael Ripoll <iripoll@disca.upv.es> Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan-Simon <dl9pf@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/1427456301-3764-1-git-send-email-hecmargi@upv.es Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-08x86: Reject x32 executables if x32 ABI not supportedBen Hutchings
It is currently possible to execve() an x32 executable on an x86_64 kernel that has only ia32 compat enabled. However all its syscalls will fail, even _exit(). This usually causes it to segfault. Change the ELF compat architecture check so that x32 executables are rejected if we don't support the x32 ABI. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-05x86, vdso: Move the 32-bit vdso special pages after the textAndy Lutomirski
This unifies the vdso mapping code and teaches it how to map special pages at addresses corresponding to symbols in the vdso image. The new code is used for all vdso variants, but so far only the 32-bit variants use the new vvar page position. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Reimplement vdso.so preparation in build-time CAndy Lutomirski
Currently, vdso.so files are prepared and analyzed by a combination of objcopy, nm, some linker script tricks, and some simple ELF parsers in the kernel. Replace all of that with plain C code that runs at build time. All five vdso images now generate .c files that are compiled and linked in to the kernel image. This should cause only one userspace-visible change: the loaded vDSO images are stripped more heavily than they used to be. Everything outside the loadable segment is dropped. In particular, this causes the section table and section name strings to be missing. This should be fine: real dynamic loaders don't load or inspect these tables anyway. The result is roughly equivalent to eu-strip's --strip-sections option. The purpose of this change is to enable the vvar and hpet mappings to be moved to the page following the vDSO load segment. Currently, it is possible for the section table to extend into the page after the load segment, so, if we map it, it risks overlapping the vvar or hpet page. This happens whenever the load segment is just under a multiple of PAGE_SIZE. The only real subtlety here is that the old code had a C file with inline assembler that did 'call VDSO32_vsyscall' and a linker script that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most likely worked by accident: the linker script entry defines a symbol associated with an address as opposed to an alias for the real dynamic symbol __kernel_vsyscall. That caused ld to relocate the reference at link time instead of leaving an interposable dynamic relocation. Since the VDSO32_vsyscall hack is no longer needed, I now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it work. vdso2c will generate an error and abort the build if the resulting image contains any dynamic relocations, so we won't silently generate bad vdso images. (Dynamic relocations are a problem because nothing will even attempt to relocate the vdso.) Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Clean up 32-bit vs 64-bit vdso paramsAndy Lutomirski
Rather than using 'vdso_enabled' and an awful #define, just call the parameters vdso32_enabled and vdso64_enabled. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13x86, vdso: Remove compat vdso supportAndy Lutomirski
The compat vDSO is a complicated hack that's needed to maintain compatibility with a small range of glibc versions. This removes it and replaces it with a much simpler hack: a config option to disable the 32-bit vDSO by default. This also changes the default value of CONFIG_COMPAT_VDSO to n -- users configuring kernels from scratch almost certainly want that choice. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-11mm: use vm_unmapped_area() on x86_64 architectureMichel Lespinasse
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-29Merge branch 'x86-x32-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
2012-03-28Disintegrate asm/system.h for X86David Howells
Disintegrate asm/system.h for X86. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> cc: x86@kernel.org
2012-02-21x32: Fix coding style violations in the x32 VDSO codeH. Peter Anvin
Move the prototype for x32_setup_additional_pages() to a header file, and adjust the coding style to match standard. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: H. J. Lu <hjl.tools@gmail.com>
2012-02-20x32: Handle process creationH. Peter Anvin
Allow an x32 process to be started. Originally-by: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2012-02-20x86: Factor out TIF_IA32 from 32-bit address spaceH. Peter Anvin
Factor out IA32 (compatibility instruction set) from 32-bit address space in the thread_info flags; this is a precondition patch for x32 support. Originally-by: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-4pr1xnnksprt7t0h3w5fw4rv@git.kernel.org
2011-08-05x86, amd: Avoid cache aliasing penalties on AMD family 15hBorislav Petkov
This patch provides performance tuning for the "Bulldozer" CPU. With its shared instruction cache there is a chance of generating an excessive number of cache cross-invalidates when running specific workloads on the cores of a compute module. This excessive amount of cross-invalidations can be observed if cache lines backed by shared physical memory alias in bits [14:12] of their virtual addresses, as those bits are used for the index generation. This patch addresses the issue by clearing all the bits in the [14:12] slice of the file mapping's virtual address at generation time, thus forcing those bits the same for all mappings of a single shared library across processes and, in doing so, avoids instruction cache aliases. It also adds the command line option "align_va_addr=(32|64|on|off)" with which virtual address alignment can be enabled for 32-bit or 64-bit x86 individually, or both, or be completely disabled. This change leaves virtual region address allocation on other families and/or vendors unaffected. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-02-16x86: ELF_PLAT_INIT() shouldn't worry about TIF_IA32Oleg Nesterov
The 64-bit version of ELF_PLAT_INIT() clears TIF_IA32, but at this point it has already been cleared by SET_PERSONALITY == set_personality_64bit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29x86: get rid of the insane TIF_ABI_PENDING bitH. Peter Anvin
Now that the previous commit made it possible to do the personality setting at the point of no return, we do just that for ELF binaries. And suddenly all the reasons for that insane TIF_ABI_PENDING bit go away, and we can just make SET_PERSONALITY() just do the obvious thing for a 32-bit compat process. Everything becomes much more straightforward this way. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16elf: kill USE_ELF_CORE_DUMPChristoph Hellwig
Currently all architectures but microblaze unconditionally define USE_ELF_CORE_DUMP. The microblaze omission seems like an error to me, so let's kill this ifdef and make sure we are the same everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: <linux-arch@vger.kernel.org> Cc: Michal Simek <michal.simek@petalogix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-09x86-64: make compat_start_thread() match start_thread()H. Peter Anvin
For no real good reason, compat_start_thread() was embedded inline in <asm/elf.h> whereas the native start_thread() lives in process_*.c. Move compat_start_thread() to process_64.c, remove gratuitious differences, and fix a few items which mostly look like bit rot. In particular, compat_start_thread() didn't do free_thread_xstate(), which means it was hanging on to the xstate store area even when it was not needed. It was also not setting old_rsp, but it looks like that generally shouldn't matter for a 32-bit process. Note: compat_start_thread *has* to be a macro, since it is tested with start_thread_ia32() as the out of line function name. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
2009-09-10x86: Increase MIN_GAP to include randomized stackMichal Hocko
Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Stable Team <stable@kernel.org>
2009-02-10x86: make lazy %gs optional on x86_32Tejun Heo
Impact: pt_regs changed, lazy gs handling made optional, add slight overhead to SAVE_ALL, simplifies error_code path a bit On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs doesn't have place for it and gs is saved/loaded only when necessary. In preparation for stack protector support, this patch makes lazy %gs handling optional by doing the followings. * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs. * Save and restore %gs along with other registers in entry_32.S unless LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL even when LAZY_GS. However, it adds no overhead to common exit path and simplifies entry path with error code. * Define different user_gs accessors depending on LAZY_GS and add lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The lazy_*_gs() ops are used to save, load and clear %gs lazily. * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly. xen and lguest changes need to be verified. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-10x86: add %gs accessors for x86_32Tejun Heo
Impact: cleanup On x86_32, %gs is handled lazily. It's not saved and restored on kernel entry/exit but only when necessary which usually is during task switch but there are few other places. Currently, it's done by calling savesegment() and loadsegment() explicitly. Define get_user_gs(), set_user_gs() and task_user_gs() and use them instead. While at it, clean up register access macros in signal.c. This cleans up code a bit and will help future changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-25[S390] arch_setup_additional_pages argumentsMartin Schwidefsky
arch_setup_additional_pages currently gets two arguments, the binary format descripton and an indication if the process uses an executable stack or not. The second argument is not used by anybody, it could be removed without replacement. What actually does make sense is to pass an indication if the process uses the elf interpreter or not. The glibc code will not use anything from the vdso if the process does not use the dynamic linker, so for statically linked binaries the architecture backend can choose not to map the vdso. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-22x86: Fix ASM_X86__ header guardsH. Peter Anvin
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since: a. the double underscore is ugly and pointless. b. no leading underscore violates namespace constraints. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22x86, um: ... and asm-x86 moveAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com>