summaryrefslogtreecommitdiff
path: root/fs/binfmt_elf.c
AgeCommit message (Collapse)Author
2008-04-29elf: fix shadowed variables in fs/binfmt_elf.cWANG Cong
Fix these sparse warings: fs/binfmt_elf.c:1749:29: warning: symbol 'tmp' shadows an earlier one fs/binfmt_elf.c:1734:28: originally declared here fs/binfmt_elf.c:2009:26: warning: symbol 'vma' shadows an earlier one fs/binfmt_elf.c:1892:24: originally declared here [akpm@linux-foundation.org: chose better variable name] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29BINFMT: fill_elf_header cleanup - use straight memset firstCyrill Gorcunov
This patch does simplify fill_elf_header function by setting to zero the whole elf header first. So we fillup the fields we really need only. before: text data bss dec hex filename 11735 80 0 11815 2e27 fs/binfmt_elf.o after: text data bss dec hex filename 11710 80 0 11790 2e0e fs/binfmt_elf.o viola, 25 bytes of text is freed Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-25[PATCH] sanitize handling of shared descriptor tables in failing execve()Al Viro
* unshare_files() can fail; doing it after irreversible actions is wrong and de_thread() is certainly irreversible. * since we do it unconditionally anyway, we might as well do it in do_execve() and save ourselves the PITA in binfmt handlers, etc. * while we are at it, binfmt_som actually leaked files_struct on failure. As a side benefit, unshare_files(), put_files_struct() and reset_files_struct() become unexported. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-04core dump: user_regset writebackRoland McGrath
This makes the user_regset-based core dump code call user_regset writeback hooks when available. This is necessary groundwork to allow IA64 to set CORE_DUMP_USE_REGSET. Cc: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08Remove a.out interpreter support in ELF loaderAndi Kleen
Following the deprecation schedule the a.out ELF interpreter support is removed now with this patch. a.out ELF interpreters were an transition feature for moving a.out systems to ELF, but they're unlikely to be still needed. Pure a.out systems will still work of course. This allows to simplify the hairy ELF loader. Signed-off-by: Andi Kleen <ak@suse.de> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUTDavid Howells
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set. Not all architectures support the A.OUT binfmt, so the ELF binfmt should not be permitted to go looking for A.OUT libraries to load in such a case. Not only that, but under such conditions A.OUT core dumps are not produced either. To make this work, this patch also does the following: (1) Makes the existence of the contents of linux/a.out.h contingent on CONFIG_ARCH_SUPPORTS_AOUT. (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT core dumping code. (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This is then included only where needed. This means that this bit of arch code will be stored in the appropriate A.OUT binfmt module rather than the core kernel. (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not needed) and FRV. This patch depends on the previous patch to move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. [jdike@addtoit.com: uml: re-remove accidentally restored code] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06brk randomization: introduce CONFIG_COMPAT_BRKIngo Molnar
based on similar patch from: Pavel Machek <pavel@ucw.cz> Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free (but not obliged to) randomize the brk area. Heap randomization breaks ancient binaries, so we keep COMPAT_BRK enabled by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-03fs/binfmt_elf.c: spello fixOhad Ben-Cohen
s/litle/little Signed-off-by: Ohad Ben-Cohen <ohad@bencohen.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-30x86: remove iBCS supportAndi Kleen
ibcs2 support has never been supported on 2.6 kernels as far as I know, and if it has it must have been an external patch. Anyways, if anybody applies an external patch they could as well readd the ibcs checking code to the ELF loader in the same patch. But there is no reason to keep this code running in all Linux kernels. This will save at least two strcmps each ELF execution. No deprecation period because it could not have been used anyway. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30elf core dump: notes user_regsetRoland McGrath
This modifies the ELF core dump code under #ifdef CORE_DUMP_USE_REGSET. It changes nothing when this macro is not defined. When it's #define'd by some arch header (e.g. asm/elf.h), the arch must support the user_regset (linux/regset.h) interface for reading thread state. This provides an alternate version of note segment writing that is based purely on the user_regset interfaces. When CORE_DUMP_USE_REGSET is set, the arch need not define macros such as ELF_CORE_COPY_REGS and ELF_ARCH. All that information is taken from the user_regset data structures. The core dumps come out exactly the same if arch's definitions for its user_regset details are correct. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30elf core dump: notes reorgRoland McGrath
This pulls out the code for writing the notes segment of an ELF core dump into separate functions. This cleanly isolates into one cluster of functions everything that deals with the note formats and the hooks into arch code to fill them. The top-level elf_core_dump function itself now deals purely with the generic ELF format and the memory segments. This only moves code around into functions that can be inlined away. It should not change any behavior at all. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: PIE executable randomization, checkpatch fixesAndrew Morton
#39: FILE: arch/ia64/ia32/binfmt_elf32.c:229: +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused) WARNING: no space between function name and open parenthesis '(' #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229: +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused) WARNING: line over 80 characters #67: FILE: arch/x86/kernel/sys_x86_64.c:80: + new_begin = randomize_range(*begin, *begin + 0x02000000, 0); ERROR: use tabs not spaces #110: FILE: arch/x86/kernel/sys_x86_64.c:185: + ^I mm->cached_hole_size = 0;$ ERROR: use tabs not spaces #111: FILE: arch/x86/kernel/sys_x86_64.c:186: + ^I^Imm->free_area_cache = mm->mmap_base;$ ERROR: use tabs not spaces #112: FILE: arch/x86/kernel/sys_x86_64.c:187: + ^I}$ ERROR: use tabs not spaces #141: FILE: arch/x86/kernel/sys_x86_64.c:216: + ^I^I/* remember the largest hole we saw so far */$ ERROR: use tabs not spaces #142: FILE: arch/x86/kernel/sys_x86_64.c:217: + ^I^Iif (addr + mm->cached_hole_size < vma->vm_start)$ ERROR: use tabs not spaces #143: FILE: arch/x86/kernel/sys_x86_64.c:218: + ^I^I mm->cached_hole_size = vma->vm_start - addr;$ ERROR: use tabs not spaces #157: FILE: arch/x86/kernel/sys_x86_64.c:232: + ^Imm->free_area_cache = TASK_UNMAPPED_BASE;$ ERROR: need a space before the open parenthesis '(' #291: FILE: arch/x86/mm/mmap_64.c:101: + } else if(mmap_is_legacy()) { WARNING: braces {} are not necessary for single statement blocks #302: FILE: arch/x86/mm/mmap_64.c:112: + if (current->flags & PF_RANDOMIZE) { + mm->mmap_base += ((long)rnd) << PAGE_SHIFT; + } WARNING: line over 80 characters #314: FILE: fs/binfmt_elf.c:48: +static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long); WARNING: no space between function name and open parenthesis '(' #314: FILE: fs/binfmt_elf.c:48: +static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long); WARNING: line over 80 characters #429: FILE: fs/binfmt_elf.c:438: + eppnt, elf_prot, elf_type, total_size); ERROR: need space after that ',' (ctx:VxV) #480: FILE: fs/binfmt_elf.c:939: + elf_prot, elf_flags,0); ^ total: 9 errors, 7 warnings, 461 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: PIE executable randomizationJiri Kosina
main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). The code has been extraced from Ingo's exec-shield patch http://people.redhat.com/mingo/exec-shield/ [akpm@linux-foundation.org: fix used-uninitialsied warning] [kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: randomize brkJiri Kosina
Randomize the location of the heap (brk) for i386 and x86_64. The range is randomized in the range starting at current brk location up to 0x02000000 offset for both architectures. This, together with pie-executable-randomization.patch and pie-executable-randomization-fix.patch, should make the address space randomization on i386 and x86_64 complete. Arjan says: This is known to break older versions of some emacs variants, whose dumper code assumed that the last variable declared in the program is equal to the start of the dynamically allocated memory region. (The dumper is the code where emacs effectively dumps core at the end of it's compilation stage; this coredump is then loaded as the main program during normal use) iirc this was 5 years or so; we found this way back when I was at RH and we first did the security stuff there (including this brk randomization). It wasn't all variants of emacs, and it got fixed as a result (I vaguely remember that emacs already had code to deal with it for other archs/oses, just ifdeffed wrongly). It's a rare and wrong assumption as a general thing, just on x86 it mostly happened to be true (but to be honest, it'll break too if gcc does something fancy or if the linker does a non-standard order). Still its something we should at least document. Note 2: afaik it only broke the emacs *build*. I'm not 100% sure about that (it IS 5 years ago) though. [ akpm@linux-foundation.org: deuglification ] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-07core dump: real_parent ppidRoland McGrath
The pr_ppid field reported in core dumps should match what getppid() would have returned to that process, regardless of whether a debugger is attached. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19pid namespaces: changes to show virtual ids to userPavel Emelyanov
This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19pid namespaces: round up the APIPavel Emelianov
The set of functions process_session, task_session, process_group and task_pgrp is confusing, as the names can be mixed with each other when looking at the code for a long time. The proposals are to * equip the functions that return the integer with _nr suffix to represent that fact, * and to make all functions work with task (not process) by making the common prefix of the same name. For monotony the routines signal_session() and set_signal_session() are replaced with task_session_nr() and set_task_session(), especially since they are only used with the explicit task->signal dereference. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Break ELF_PLATFORM and stack pointer randomization dependencyFranck Bui-Huu
Currently arch_align_stack() is used by fs/binfmt_elf.c to randomize stack pointer inside a page. But this happens only if ELF_PLATFORM symbol is defined. ELF_PLATFORM is normally set if the architecture wants ld.so to load implementation specific libraries for optimization. And currently a lot of architectures just yield this symbol to NULL. This is the case for MIPS architecture where ELF_PLATFORM is NULL but arch_align_stack() has been redefined to do stack inside page randomization. So in this case no randomization is actually done. This patch breaks this dependency which seems to be useless and allows platforms such MIPS to do the randomization. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17increase AT_VECTOR_SIZE to terminate saved_auxv properlyOlaf Hering
include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO. fs/binfmt_elf.c has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT entries. So in the worst case, saved_auxv does not get an AT_NULL entry at the end. The saved_auxv array must be terminated with an AT_NULL entry. Make the size of mm_struct->saved_auxv arch dependend, based on the number of ARCH_DLINFO entries. Signed-off-by: Olaf Hering <olh@suse.de> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add MMF_DUMP_ELF_HEADERSRoland McGrath
This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter. This dumps the first page (only) of a private file mapping if it appears to be a mapping of an ELF file. Including these pages in the core dump may give sufficient identifying information to associate the original DSO and executable file images and their debugging information with a core file in a generic way just from its contents (e.g. when those binaries were built with ld --build-id). I expect this to become the default behavior eventually. Existing versions of gdb can be confused by the core dumps it creates, so it won't enabled by default for some time to come. Soon many people will have systems with a gdb that handle these dumps, so they can arrange to set the bit at boot and have it inherited system-wide. This also cleans up the checking of the MMF_DUMP_* flag bits, which did not need to be using atomic macros. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Deprecate a.out ELF interpretersAndi Kleen
The Linux ELF loader is quite complicated and messy code (that could probably need a rewrite, but that's a different chapter). One particular messy part in it is the support for non ELF a.out ld.sos. This was originally added to make transition from a.out to ELF easier because an a.out ELF ld.so could be still build using an older a.out toolkit. But by now that should be fully obsolete and removing it would clean up binfmt_elf.c up a bit. I propose to deprecate this support and remove for 2.6.25. Drawback is that someone still runs their system with a.out ld.so they would need to update the ld.so when updating to a new kernel. This patch just adds an entry to the deprecation file and a printk warning users. [akpm@linux-foundation.org: better warning message] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17core_pattern: ignore RLIMIT_CORE if core_pattern is a pipeNeil Horman
For some time /proc/sys/kernel/core_pattern has been able to set its output destination as a pipe, allowing a user space helper to receive and intellegently process a core. This infrastructure however has some shortcommings which can be enhanced. Specifically: 1) The coredump code in the kernel should ignore RLIMIT_CORE limitation when core_pattern is a pipe, since file system resources are not being consumed in this case, unless the user application wishes to save the core, at which point the app is restricted by usual file system limits and restrictions. 2) The core_pattern code should be able to parse and pass options to the user space helper as an argv array. The real core limit of the uid of the crashing proces should also be passable to the user space helper (since it is overridden to zero when called). 3) Some miscellaneous bugs need to be cleaned up (specifically the recognition of a recursive core dump, should the user mode helper itself crash. Also, the core dump code in the kernel should not wait for the user mode helper to exit, since the same context is responsible for writing to the pipe, and a read of the pipe by the user mode helper will result in a deadlock. This patch: Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that core_pattern is a pipe, the entire core will be fed to the user mode helper. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: <martin.pitt@ubuntu.com> Cc: <wwoods@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17x86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #defineMark Nelson
Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which allows for more flexibility in the note type for the state of 'extended floating point' implementations in coredumps. New note types can now be added with an appropriate #define. This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all current users so there's are no change in behaviour. This will let us use different note types on powerpc for the Altivec/VMX state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE (signal processing extension) state that some embedded PowerPC cpus from Freescale have. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16remove ZERO_PAGENick Piggin
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss. Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
2007-09-19[POWERPC] spufs: Cleanup ELF coredump extra notes logicMichael Ellerman
To start with, arch_notes_size() etc. is a little too ambiguous a name for my liking, so change the function names to be more explicit. Calling through macros is ugly, especially with hidden parameters, so don't do that, call the routines directly. Use ARCH_HAVE_EXTRA_ELF_NOTES as the only flag, and based on it decide whether we want the extern declarations or the empty versions. Since we have empty routines, actually use them in the coredump code to save a few #ifdefs. We want to change the handling of foffset so that the write routine updates foffset as it goes, instead of using file->f_pos (so that writing to a pipe works). So pass foffset to the write routine, and for now just set it to file->f_pos at the end of writing. It should also be possible for the write routine to fail, so change it to return int and treat a non-zero return as failure. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-07-21revert "PIE randomization"Andrew Morton
There are reports of this causing userspace failures (http://lkml.org/lkml/2007/7/20/421). Revert. Cc: Jan Kratochvil <honza@jikos.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ulrich Kunitz <kune@deine-taler.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Bret Towe" <magnade@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19coredump masking: ELF: enable core dump filteringKawai, Hidehiro
This patch enables core dump filtering for ELF-formatted core file. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: variable length argument supportOllie Wild
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. [a.p.zijlstra@chello.nl: limit stack size] Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> [bunk@stusta.de: unexport bprm_mm_init] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16binfmt_elf warning fixAndrew Morton
fs/binfmt_elf.c: In function 'load_elf_binary': fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't use the resulting value. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16PIE randomizationJan Kratochvil
This patch is using mmap()'s randomization functionality in such a way that it maps the main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). Origin of this patch is in exec-shield (http://people.redhat.com/mingo/exec-shield/) [jkosina@suse.cz: pie randomization: fix BAD_ADDR macro] Signed-off-by: Jan Kratochvil <honza@jikos.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-06Fix elf_core_dump() when writing arch specific notes (spu coredumps)Michael Ellerman
elf_core_dump() supports dumping arch specific ELF notes, via the #define ELF_CORE_WRITE_EXTRA_NOTES. Currently the only user of this is the powerpc spu coredump code. There is a bug in the handling of foffset WRT the arch notes, which causes us to erroneously increment foffset by the size of the arch notes, leaving a block of zeroes in the file, and causing all subsequent data in the file to be at <supposed position> + <arch note size>. eg: LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000 Tells us we should have a chunk of data at 0x50000. The truth is the data is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes). This bug prevents gdb from reading the core file correctly. The simplest fix is to simply remember the size of the arch notes, and add it to foffset after we've written the arch notes. The only drawback is that if the arch code doesn't write as many bytes as it said it would, we end up with a broken core dump again. For now I think that's a reasonable requirement. Tested on a Cell blade, gdb no longer complains about the core file being bogus. While I'm here I should point out that the spu coredump code does not work if we're dumping to a pipe - we'll have to wait for 23 to fix that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Invalid return value of execve() resulting in oopsesAlexey Kuznetsov
When elf loader fails to map executable (due to memory shortage or because binary is malformed), it can return 0. Normally, this is invisible because process is killed with SIGKILL and it never returns to user space. But if exec() is called from kernel thread (hotplug, whatever) consequences are more interesting and vary depending on architecture. i386. Nothing especially interesting, execve() just returns with "success" :-) x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded with zeros, ergo... double fault. ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses due to return to zero PC, sometimes it sees NaT in rXX and oopses due to NaT consumption. Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08i386: sched.h inclusion from module.h is baackAlexey Dobriyan
linux/module.h -> linux/elf.h -> asm-i386/elf.h -> linux/utsname.h -> linux/sched.h Noticeably cut the number of files which are rebuild upon touching sched.h and cut down pulled junk from every module.h inclusion. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08header cleaning: don't include smp_lock.h when not usedRandy Dunlap
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02[PATCH] fix page leak during core dumpBrian Pomerantz
When the dump cannot occur most likely because of a full file system and the page to be written is the zero page, the call to page_cache_release() is missed. Signed-off-by: Brian Pomerantz <bapper@mvista.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16[PATCH] fix process crash caused by randomisation and 64k pagesJames Bottomley
This bug was seen on ppc64, but it could have occurred on any architecture with a page size of 64k or above. The problem is that in fs/binfmt_elf.c:randomize_stack_top() randomizes the stack to within 0x7ff pages. On 4k page machines, this is 8MB; on 64k page boxes, this is 128MB. The problem is that the new binary layout (selected in arch_pick_mmap_layout) places the mapping segment 128MB or the stack rlimit away from the top of the process memory, whichever is larger. If you chose an rlimit of less than 128MB (most defaults are in the 8Mb range) then you can end up having your entire stack randomized away. The fix is to make randomize_stack_top() only steal at most 8MB, which this patch does. However, I have to point out that even with this, your stack rlimit might not be exactly what you get if it's > 128MB, because you're still losing the random offset of up to 8MB. The true fix should be to leave an explicit gap for the randomization plus a buffer when determining mmap_base, but that would involve fixing all the architectures. Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-13[PATCH] x86: Don't require the vDSO for handling a.out signalsAndi Kleen
and in other strange binfmts. vDSO is not necessarily mapped there. Signed-off-by: Andi Kleen <ak@suse.de>
2007-01-26[PATCH] core-dumping unreadable binaries via PT_INTERPAlexey Dobriyan
Proposed patch to fix #5 in http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt aka http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073 To reproduce, do * grab poc at the end of advisory. * add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;" where first "4096" is something equal to or greater than 4096. * ./poc /usr/bin/sudo && ls -l Here I get with 2.6.20-rc5: -rw------- 1 ad ad 102400 2007-01-15 19:17 core ---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudo Check for MAY_READ like binfmt_misc.c does. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] i386 vDSO: use VM_ALWAYSDUMPRoland McGrath
This patch fixes core dumps to include the vDSO vma, which is left out now. It removes the special-case core writing macros, which were not doing the right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the vma; there is no need for the fixmap page to be installed. It handles the CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from get_gate_vma after real vmas in the same way the /proc/PID/maps code does. This changes core dumps so they no longer include the non-PT_LOAD phdrs from the vDSO. I made the change to add them in the first place, but in turned out that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner to leave them out, and just let the phdrs inside the vDSO image speak for themselves. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] Add VM_ALWAYSDUMPRoland McGrath
This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct. This provides a clean explicit way to have a vma always included in core dumps, as is needed for vDSO's. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-06Revert "[PATCH] binfmt_elf: randomize PIE binaries (2nd try)"Linus Torvalds
This reverts commit 59287c0913cc9a6c75712a775f6c1c1ef418ef3b. Hugh Dickins reports that it causes random failures on x86 with SuSE 10.2, and points out "Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE, sure to place the ET_DYN from time to time just where the comment says it's trying to avoid? I assume that somehow results in the error reported." (where the comment in question is the existing comment in the source code about mmap/brk clashes). Suggested-by: Hugh Dickins <hugh@veritas.com> Acked-by: Marcus Meissner <meissner@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] add process_session() helper routineCedric Le Goater
Replace occurences of task->signal->session by a new process_session() helper routine. It will be useful for pid namespaces to abstract the session pid number. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] VFS: change struct file to use struct pathJosef "Jeff" Sipek
This patch changes struct file to use struct path instead of having independent pointers to struct dentry and struct vfsmount, and converts all users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}. Additionally, it adds two #define's to make the transition easier for users of the f_dentry and f_vfsmnt. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] fs: remove unused variableDavid Rientjes
Removed unused 'have_pt_gnu_stack' variable. Reported by David Binderman <dcb314@hotmail.com> Signed-off-by: David Rientjes <rientjes@cs.washington.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] elf: Always define elf_addr_t in linux/elf.hMagnus Damm
Define elf_addr_t in linux/elf.h. The size of the type is determined using ELF_CLASS. This allows us to remove the defines that today are spread all over .c and .h files. Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Cc: Daniel Jacobowitz <drow@false.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] binfmt: fix uaccess handlingHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] binfmt_elf: randomize PIE binaries (2nd try)Marcus Meissner
Randomizes -pie compiled binaries from 64k (0x10000) up to ELF_ET_DYN_BASE. 0 -> 64k is excluded to allow NULL ptr accesses to fail. Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-04[POWERPC] coredump: Add SPU elf notes to coredump.Dwayne Grant McConnell
This patch adds SPU elf notes to the coredump. It creates a separate note for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1, /signal1_type, /signal2, /signal2_type, /event_mask, /event_status, /mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id. A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to specify they have extra elf core notes. A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the additional notes could be calculated and added to the notes phdr entry. A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes would be written after the existing notes. The SPU coredump code resides in spufs. Stub functions are provided in the kernel which are hooked into the spufs code which does the actual work via register_arch_coredump_calls(). A new set of __spufs_<file>_read/get() functions was provided to allow the coredump code to read from the spufs files without having to lock the SPU context for each file read from. Cc: <linux-arch@vger.kernel.org> Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
2006-10-15[PATCH] Fix core files so they make sense to gdb...Petr Vandrovec
It is silly to use non-static variable for writting zeroes to the file. And more seriously, foffset in core dump file dump function was incremented too much, so some parts of core dump were shifted by size of few phdrs and notes down, so although gdb was able to load that file, it did not make lot of sense - in my test case data pages were shifted down by about 900 bytes. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-13[PATCH] Get core dump code to work...Petr Vandrovec
The file based core dump code was broken by pipe changes - a relative llseek returns the absolute file position on success, not the relative one, so dump_seek() always failed when invoked with non-zero current position. Only success/failure can be tested with relative lseek, we have to trust kernel that on success we've got right file offset. With this fix in place I have finally real core files instead of 1KB fragments... Signed-off-by: Petr Vandrovec <petr@vandrovec.name> [ Cleaned it up a bit while here - use SEEK_CUR instead of hardcoding 1 ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>