summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-09-23Blackfin: override text/data checking functionsMike Frysinger
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Getz <rgetz@blackfin.uclinux.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23lockdep: use new arch_is_kernel_data()Mike Frysinger
This allows lockdep to locate symbols that are in arch-specific data sections (such as data in Blackfin on-chip SRAM regions). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Getz <rgetz@blackfin.uclinux.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kallsyms: use new arch_is_kernel_text()Mike Frysinger
This allows kallsyms to locate symbols that are in arch-specific text sections (such as text in Blackfin on-chip SRAM regions). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Getz <rgetz@blackfin.uclinux.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23asm/sections: add text/data checking functions for arches to overrideMike Frysinger
Some ports (like the Blackfin arch) have a discontiguous memory map which means there may be text or data that falls outside of the standard range of the start/end text/data symbols. Creating some helper functions allows these non-standard ports to declare these regions without adversely affecting anyone else. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Getz <rgetz@blackfin.uclinux.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kallsyms: fix segfault in prefix_underscores_count()Paul Mundt
Commit b478b782e110fdb4135caa3062b6d687e989d994 "kallsyms, tracing: output more proper symbol name" introduces a "bugfix" that introduces a segfault in kallsyms in my configurations. The cause is the introduction of prefix_underscores_count() which attempts to count underscores, even in symbols that do not have them. As a result, it just uselessly runs past the end of the buffer until it crashes: CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 KSYM .tmp_kallsyms1.S /bin/sh: line 1: 16934 Done sh-linux-gnu-nm -n .tmp_vmlinux1 16935 Segmentation fault | scripts/kallsyms > .tmp_kallsyms1.S make: *** [.tmp_kallsyms1.S] Error 139 This simplifies the logic and just does a straightforward count. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Paulo Marques <pmarques@grupopie.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> [2.6.30.x, 2.6.31.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23getrusage: fill ru_maxrss valueJiri Pirko
Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater mark. This struct is filled as a parameter to getrusage syscall. ->ru_maxrss value is set to KBs which is the way it is done in BSD systems. /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of this util was notified by me with the patch which corrects it and cc'ed. To make this happen we extend struct signal_struct by two fields. The first one is ->maxrss which we use to store rss hiwater of the task. The second one is ->cmaxrss which we use to store highest rss hiwater of all task childs. These values are used in k_getrusage() to actually fill ->ru_maxrss. k_getrusage() uses current rss hiwater value directly if mm struct exists. Note: exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss. it is intetionally behavior. *BSD getrusage have exec() inheriting. test programs ======================================================== getrusage.c =========== #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/time.h> #include <sys/resource.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <signal.h> #include <sys/mman.h> #include "common.h" #define err(str) perror(str), exit(1) int main(int argc, char** argv) { int status; printf("allocate 100MB\n"); consume(100); printf("testcase1: fork inherit? \n"); printf(" expect: initial.self ~= child.self\n"); show_rusage("initial"); if (__fork()) { wait(&status); } else { show_rusage("fork child"); _exit(0); } printf("\n"); printf("testcase2: fork inherit? (cont.) \n"); printf(" expect: initial.children ~= 100MB, but child.children = 0\n"); show_rusage("initial"); if (__fork()) { wait(&status); } else { show_rusage("child"); _exit(0); } printf("\n"); printf("testcase3: fork + malloc \n"); printf(" expect: child.self ~= initial.self + 50MB\n"); show_rusage("initial"); if (__fork()) { wait(&status); } else { printf("allocate +50MB\n"); consume(50); show_rusage("fork child"); _exit(0); } printf("\n"); printf("testcase4: grandchild maxrss\n"); printf(" expect: post_wait.children ~= 300MB\n"); show_rusage("initial"); if (__fork()) { wait(&status); show_rusage("post_wait"); } else { system("./child -n 0 -g 300"); _exit(0); } printf("\n"); printf("testcase5: zombie\n"); printf(" expect: pre_wait ~= initial, IOW the zombie process is not accounted.\n"); printf(" post_wait ~= 400MB, IOW wait() collect child's max_rss. \n"); show_rusage("initial"); if (__fork()) { sleep(1); /* children become zombie */ show_rusage("pre_wait"); wait(&status); show_rusage("post_wait"); } else { system("./child -n 400"); _exit(0); } printf("\n"); printf("testcase6: SIG_IGN\n"); printf(" expect: initial ~= after_zombie (child's 500MB alloc should be ignored).\n"); show_rusage("initial"); signal(SIGCHLD, SIG_IGN); if (__fork()) { sleep(1); /* children become zombie */ show_rusage("after_zombie"); } else { system("./child -n 500"); _exit(0); } printf("\n"); signal(SIGCHLD, SIG_DFL); printf("testcase7: exec (without fork) \n"); printf(" expect: initial ~= exec \n"); show_rusage("initial"); execl("./child", "child", "-v", NULL); return 0; } child.c ======= #include <sys/types.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/time.h> #include <sys/resource.h> #include "common.h" int main(int argc, char** argv) { int status; int c; long consume_size = 0; long grandchild_consume_size = 0; int show = 0; while ((c = getopt(argc, argv, "n:g:v")) != -1) { switch (c) { case 'n': consume_size = atol(optarg); break; case 'v': show = 1; break; case 'g': grandchild_consume_size = atol(optarg); break; default: break; } } if (show) show_rusage("exec"); if (consume_size) { printf("child alloc %ldMB\n", consume_size); consume(consume_size); } if (grandchild_consume_size) { if (fork()) { wait(&status); } else { printf("grandchild alloc %ldMB\n", grandchild_consume_size); consume(grandchild_consume_size); exit(0); } } return 0; } common.c ======== #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/time.h> #include <sys/resource.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <signal.h> #include <sys/mman.h> #include "common.h" #define err(str) perror(str), exit(1) void show_rusage(char *prefix) { int err, err2; struct rusage rusage_self; struct rusage rusage_children; printf("%s: ", prefix); err = getrusage(RUSAGE_SELF, &rusage_self); if (!err) printf("self %ld ", rusage_self.ru_maxrss); err2 = getrusage(RUSAGE_CHILDREN, &rusage_children); if (!err2) printf("children %ld ", rusage_children.ru_maxrss); printf("\n"); } /* Some buggy OS need this worthless CPU waste. */ void make_pagefault(void) { void *addr; int size = getpagesize(); int i; for (i=0; i<1000; i++) { addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); if (addr == MAP_FAILED) err("make_pagefault"); memset(addr, 0, size); munmap(addr, size); } } void consume(int mega) { size_t sz = mega * 1024 * 1024; void *ptr; ptr = malloc(sz); memset(ptr, 0, sz); make_pagefault(); } pid_t __fork(void) { pid_t pid; pid = fork(); make_pagefault(); return pid; } common.h ======== void show_rusage(char *prefix); void make_pagefault(void); void consume(int mega); pid_t __fork(void); FreeBSD result (expected result) ======================================================== allocate 100MB testcase1: fork inherit? expect: initial.self ~= child.self initial: self 103492 children 0 fork child: self 103540 children 0 testcase2: fork inherit? (cont.) expect: initial.children ~= 100MB, but child.children = 0 initial: self 103540 children 103540 child: self 103564 children 0 testcase3: fork + malloc expect: child.self ~= initial.self + 50MB initial: self 103564 children 103564 allocate +50MB fork child: self 154860 children 0 testcase4: grandchild maxrss expect: post_wait.children ~= 300MB initial: self 103564 children 154860 grandchild alloc 300MB post_wait: self 103564 children 308720 testcase5: zombie expect: pre_wait ~= initial, IOW the zombie process is not accounted. post_wait ~= 400MB, IOW wait() collect child's max_rss. initial: self 103564 children 308720 child alloc 400MB pre_wait: self 103564 children 308720 post_wait: self 103564 children 411312 testcase6: SIG_IGN expect: initial ~= after_zombie (child's 500MB alloc should be ignored). initial: self 103564 children 411312 child alloc 500MB after_zombie: self 103624 children 411312 testcase7: exec (without fork) expect: initial ~= exec initial: self 103624 children 411312 exec: self 103624 children 411312 Linux result (actual test result) ======================================================== allocate 100MB testcase1: fork inherit? expect: initial.self ~= child.self initial: self 102848 children 0 fork child: self 102572 children 0 testcase2: fork inherit? (cont.) expect: initial.children ~= 100MB, but child.children = 0 initial: self 102876 children 102644 child: self 102572 children 0 testcase3: fork + malloc expect: child.self ~= initial.self + 50MB initial: self 102876 children 102644 allocate +50MB fork child: self 153804 children 0 testcase4: grandchild maxrss expect: post_wait.children ~= 300MB initial: self 102876 children 153864 grandchild alloc 300MB post_wait: self 102876 children 307536 testcase5: zombie expect: pre_wait ~= initial, IOW the zombie process is not accounted. post_wait ~= 400MB, IOW wait() collect child's max_rss. initial: self 102876 children 307536 child alloc 400MB pre_wait: self 102876 children 307536 post_wait: self 102876 children 410076 testcase6: SIG_IGN expect: initial ~= after_zombie (child's 500MB alloc should be ignored). initial: self 102876 children 410076 child alloc 500MB after_zombie: self 102880 children 410076 testcase7: exec (without fork) expect: initial ~= exec initial: self 102880 children 410076 exec: self 102880 children 410076 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kmap_types.h: rename D macroAndi Kleen
I tend to use a 'D' debugging macro a lot during debugging. When I define it before includes I often get conflicts with kmap_types.h's use of 'D' too. It's not very nice when a global include pollutes the name space like this. Rename the kmap_types.h D to KMAP_D. It is only used temporarily in the header so has no effect on anything else. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23Make sure the value in abs() does not get truncated if it is greater than 2^32Rolf Eike Beer
abs() will truncate the input if is it outside the 2^32 range. Fix that by assuming `long' input. This might generate worse code in the common case. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23fix compat_sys_utimensat()Suzuki Poulose
Compat utimensat() returns EINVAL when the tv_nsec is one of UTIME_OMIT or UTIME_NOW and the tv_sec is set to non-zero. As per man pages, the tv_sec field should be ignored. sys_utimensat() works fine in this case. Test case: #define _GNU_SOURCE #define _ATFILE_SOURCE #include <stdio.h> #include <fcntl.h> #include <unistd.h> #include <sys/stat.h> #include <stdlib.h> main(int argc, char *argv[]) { struct timespec ts[2]; struct timespec *tsp; if (argc < 2) { fprintf(stderr, "Usage : %s filename\n", argv[0]); exit (-1); } ts[0].tv_nsec = ts[1].tv_nsec = UTIME_NOW; ts[0].tv_sec = ts[1].tv_sec = 1; tsp = ts; if (utimensat(AT_FDCWD, argv[1],tsp,0) == -1) perror("utimensat"); else fprintf(stdout, "utimensat success\n"); return 0; } mjs22lp5:~ # cc -m64 utimensat-test.c -o utimensat_test64 mjs22lp5:~ # cc -m32 utimensat-test.c -o utimensat_test32 mjs22lp5:~ # ./utimensat_test32 /tmp/utimensat_test utimensat: Invalid argument mjs22lp5:~ # ./utimensat_test64 /tmp/utimensat_test utimensat success mjs22lp5:~ # uname -r 2.6.31-rc8 With the patch : mjs22lp5:~ # ./utimensat_test64 /tmp/utimensat_test utimensat success mjs22lp5:~ # ./utimensat_test32 /tmp/utimensat_test utimensat success mjs22lp5:~ # uname -r 2.6.31-rc8utimensat Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23vlynq: includecheck fix: drivers/vlynq/vlynq.cJaswinder Singh Rajput
Fix the following 'make includecheck' warning: drivers/vlynq/vlynq.c: linux/device.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23qnx4: remove write supportChristoph Hellwig
qnx4 wrte support has never been fully implement, is broken since the dawn of time and hasn't been actively developed since before git history started. Instead of letting it further bitrot and complicate API transition (like the new truncate code) remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Anders Larsen <al@alarsen.net> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23ntfs: remove ntfs_file_writeChristoph Hellwig
do_sync_write() does the right thing for turning the aio_writev method into a normal non-vectored synchronous write, no need to duplicate it in ntfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23anonfd: split interface into file creation and installDavide Libenzi
Split the anonfd interface into a bare file pointer creation one, and a file pointer creation plus install one. There are cases, like the usage of eventfds inside other kernel interfaces, where the file pointer created by anonfd needs to be used inside the initialization of other structures. As it is right now, as soon as anon_inode_getfd() returns, the kenrle can race with userspace closing the newly installed file descriptor. This patch, while keeping the old anon_inode_getfd(), introduces a new anon_inode_getfile() (whose services are reused in anon_inode_getfd()) that allows to split the file creation phase and the fd install one. Once all the kernel structures are initialized, the code can call the proper fd_install(). Gregory manifested the need for something like this inside KVM. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: James Morris <jmorris@namei.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Gregory Haskins <ghaskins@novell.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23MAINTAINERS: remove dead ncpfs listBartlomiej Zolnierkiewicz
On Saturday 01 August 2009 00:30:39 Mail Delivery Subsystem wrote: > Delivery to the following recipient failed permanently: > > linware@sh.cvut.cz > > Technical details of permanent failure: > Google tried to deliver your message, but it was rejected by the recipient > domain. We recommend contacting the other email provider for further > information about the cause of this error. The error that the other server > returned was: 450 450 <linware@sh.cvut.cz>: Recipient address rejected: > undeliverable address: unknown user: "linware" (state 14). Cc: Petr Vandrovec <vandrove@vc.cvut.cz> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23aio.c: move EXPORT* macros to line after functionH Hartley Sweeten
As mentioned in Documentation/CodingStyle, move EXPORT* macro's to the line immediately after the closing function brace line. Also, move the __initcall() similarly. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23BUILD_BUG_ON(): fix it and a couple of bogus uses of itJan Beulich
gcc permitting variable length arrays makes the current construct used for BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the controlling expression isn't really constant. Instead, this patch makes it so that a bit field gets used here. Consequently, those uses where the condition isn't really constant now also need fixing. Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if the expression is compile time constant (__builtin_constant_p() yields true), the array is still deemed of variable length by gcc, and hence the whole expression doesn't have the intended effect. [akpm@linux-foundation.org: make arch/sparc/include/asm/vio.h compile] [akpm@linux-foundation.org: more nonsensical assertions in tpm.c..] Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rajiv Andrade <srajiv@linux.vnet.ibm.com> Cc: Mimi Zohar <zohar@us.ibm.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23fs/buffer.c: clean up EXPORT* macrosH Hartley Sweeten
According to Documentation/CodingStyle the EXPORT* macro should follow immediately after the closing function brace line. Also, mark_buffer_async_write_endio() and do_thaw_all() are not used elsewhere so they should be marked as static. In addition, file_fsync() is actually in fs/sync.c so move the EXPORT* to that file. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23fs: turn iprune_mutex into rwsemNick Piggin
We have had a report of bad memory allocation latency during DVD-RAM (UDF) writing. This is causing the user's desktop session to become unusable. Jan tracked the cause of this down to UDF inode reclaim blocking: gnome-screens D ffff810006d1d598 0 20686 1 ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800 ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580 ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0 Call Trace: [<ffffffff804477f3>] io_schedule+0x63/0xa5 [<ffffffff802c2587>] sync_buffer+0x3b/0x3f [<ffffffff80447d2a>] __wait_on_bit+0x47/0x79 [<ffffffff80447dc6>] out_of_line_wait_on_bit+0x6a/0x77 [<ffffffff802c24f6>] __wait_on_buffer+0x1f/0x21 [<ffffffff802c442a>] __bread+0x70/0x86 [<ffffffff88de9ec7>] :udf:udf_tread+0x38/0x3a [<ffffffff88de0fcf>] :udf:udf_update_inode+0x4d/0x68c [<ffffffff88de26e1>] :udf:udf_write_inode+0x1d/0x2b [<ffffffff802bcf85>] __writeback_single_inode+0x1c0/0x394 [<ffffffff802bd205>] write_inode_now+0x7d/0xc4 [<ffffffff88de2e76>] :udf:udf_clear_inode+0x3d/0x53 [<ffffffff802b39ae>] clear_inode+0xc2/0x11b [<ffffffff802b3ab1>] dispose_list+0x5b/0x102 [<ffffffff802b3d35>] shrink_icache_memory+0x1dd/0x213 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392 [<ffffffff802951fa>] alloc_page_vma+0x176/0x189 [<ffffffff802822d8>] __do_fault+0x10c/0x417 [<ffffffff80284232>] handle_mm_fault+0x466/0x940 [<ffffffff8044b922>] do_page_fault+0x676/0xabf This blocks with iprune_mutex held, which then blocks other reclaimers: X D ffff81009d47c400 0 17285 14831 ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288 ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400 ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740 Call Trace: [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9 [<ffffffff80447e1a>] mutex_lock+0x1e/0x22 [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392 [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6 [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf [<ffffffff8860f2b4>] :nvidia:nv_kern_poll+0x2e/0x73 [<ffffffff802ad949>] do_select+0x308/0x506 [<ffffffff802adced>] core_sys_select+0x1a6/0x254 [<ffffffff802ae0b7>] sys_select+0xb5/0x157 Now I think the main problem is having the filesystem block (and do IO) in inode reclaim. The problem is that this doesn't get accounted well and penalizes a random allocator with a big latency spike caused by work generated from elsewhere. I think the best idea would be to avoid this. By design if possible, or by deferring the hard work to an asynchronous context. If the latter, then the fs would probably want to throttle creation of new work with queue size of the deferred work, but let's not get into those details. Anyway, the other obvious thing we looked at is the iprune_mutex which is causing the cascading blocking. We could turn this into an rwsem to improve concurrency. It is unreasonable to totally ban all potentially slow or blocking operations in inode reclaim, so I think this is a cheap way to get a small improvement. This doesn't solve the whole problem of course. The process doing inode reclaim will still take the latency hit, and concurrent processes may end up contending on filesystem locks. So fs developers should keep these problems in mind. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jan Kara <jack@ucw.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23printk_once(): use bool for boolean flagRoland Dreier
Using the type bool (instead of int) for the __print_once flag in the printk_once() macro matches the intent of the code better, and allows the compiler to generate smaller code; eg a typical callsite with gcc 4.3.3 on i386: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-6 (-6) function old new delta static.__print_once 4 1 -3 get_cpu_vendor 146 143 -3 Saving 6 bytes of object size per callsite by slightly improving the readability of the source seems like a win to me. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23proc connector: add event for process becoming session leaderScott James Remnant
The act of a process becoming a session leader is a useful signal to a supervising init daemon such as Upstart. While a daemon will normally do this as part of the process of becoming a daemon, it is rare for its children to do so. When the children do, it is nearly always a sign that the child should be considered detached from the parent and not supervised along with it. The poster-child example is OpenSSH; the per-login children call setsid() so that they may control the pty connected to them. If the primary daemon dies or is restarted, we do not want to consider the per-login children and want to respawn the primary daemon without killing the children. This patch adds a new PROC_SID_EVENT and associated structure to the proc_event event_data union, it arranges for this to be emitted when the special PIDTYPE_SID pid is set. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Scott James Remnant <scott@ubuntu.com> Acked-by: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23seq_file: constify seq_operationsJames Morris
Make all seq_operations structs const, to help mitigate against revectoring user-triggerable function pointers. This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGSLadinu Chandrasinghe
Fix up -Wmissing-prototypes in compileable userspace code, mainly under Documentation/. Signed-off-by: Ladinu Chandrasinghe <ladinu.pub@gmail.com> Signed-off-by: Trevor Keith <tsrk@tsrk.net> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23dme1737: Keep index within pwm_config[]Roel Kluin
The static code scanner "Parfait" reported this because pwm_config is only 3 bytes - pwm_config[3] is out of range. Since this code path is never called with ix == 3 (the device has no PWM4 output) this doesn't change anything in practice. But to encourage testing with Parfait, lets make the warning go away... Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23generic-ipi: make struct call_function_data locklessXiao Guangrong
This patch can remove spinlock from struct call_function_data, the reasons are below: 1: add a new interface for cpumask named cpumask_test_and_clear_cpu(), it can atomically test and clear specific cpu, we can use it instead of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock to protect those in generic_smp_call_function_interrupt(). 2: in smp_call_function_many(), after csd_lock() return, the current's cfd_data is deleted from call_function list, so it not have race between other cpus, then cfs_data is only used in smp_call_function_many() that must disable preemption and not from a hardware interrupthandler or from a bottom half handler to call, only the correspond cpu can use it, so it not have race in current cpu, no need cfs_data->lock to protect it. 3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in generic_smp_call_function_interrupt(), so we can define cfs_data->refs to atomic_t, and no need cfs_data->lock any more. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> [akpm@linux-foundation.org: use atomic_dec_return()] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23Fix all -Wmissing-prototypes warnings in x86 defconfigTrevor Keith
Signed-off-by: Trevor Keith <tsrk@tsrk.net> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23dac960: fix undefined behavior on empty stringMichael Buesch
Fix undefined behavior due to a buffer underrun if an empty string is written to the proc file. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kmod: fix race in usermodehelper codeNeil Horman
The user mode helper code has a race in it. call_usermodehelper_exec() takes an allocated subprocess_info structure, which it passes to a workqueue, and then passes it to a kernel thread which it creates, after which it calls complete to signal to the caller of call_usermodehelper_exec() that it can free the subprocess_info struct. But since we use that structure in the created thread, we can't call complete from __call_usermodehelper(), which is where we create the kernel thread. We need to call complete() from within the kernel thread and then not use subprocess_info afterward in the case of UMH_WAIT_EXEC. Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23Move magic numbers into magic.hNick Black
Move various magic-number definitions into magic.h. Signed-off-by: Nick Black <dank@qemfd.net> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23printk: add printk_delay to make messages readable for some scenariosDave Young
When syslog is not possible, at the same time there's no serial/net console available, it will be hard to read the printk messages. For example oops/panic/warning messages in shutdown phase. Add a printk delay feature, we can make each printk message delay some milliseconds. Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay The value range from 0 - 10000, default value is 0 [akpm@linux-foundation.org: fix a few things] Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23printk boot_delay: rename printk_delay_msec to loops_per_msecDave Young
Rename `printk_delay_msec' to `loops_per_msec', because the patch "printk: add printk_delay to make messages readable for some scenarios" wishes to more appropriately use the `printk_delay_msec' identifier. [akpm@linux-foundation.org: add a comment] Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23poll/select: avoid arithmetic overflow in __estimate_accuracy()Guillaume Knispel
__estimate_accuracy() was prone to integer overflow, for example if *tv == {2147, 483648000} on a 32 bit computer (or even for delays as small as {429, 500000000} if the task is niced). Because the result was already forced between 0 and 100ms, the effect of the overflow was not too problematic, but the use of the hrtimer range feature was not optimal in overflow cases. This patch ensures that there can not be an integer overflow in this function. Signed-off-by: Guillaume Knispel <gknispel@proformatique.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kprobes: use do_IRQ() in lkdtmM. Mohan Kumar
Current lkdtm code puts a probe on __do_IRQ for some of the kdump test cases. Since __do_IRQ is deprecated, change lkdtm code to use do_IRQ function. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23smbfs: read buffer overflowRoel Kluin
This function uses signed integers for the unix_date and local variables - if a negative number is supplied and the leap-year condition is not met, month will be 0, leading to a read of day_n[-1] Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23include/linux/kmemcheck.h: fix a trillion warningsAndrew Morton
of the form include/net/inet_sock.h:208: warning: ISO C90 forbids mixed declarations and code Cc: Johannes Berg <johannes@sipsolutions.net> Acked-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23eCryptfs: Prevent lower dentry from going negative during unlinkTyler Hicks
When calling vfs_unlink() on the lower dentry, d_delete() turns the dentry into a negative dentry when the d_count is 1. This eventually caused a NULL pointer deref when a read() or write() was done and the negative dentry's d_inode was dereferenced in ecryptfs_read_update_atime() or ecryptfs_getxattr(). Placing mutt's tmpdir in an eCryptfs mount is what initially triggered the oops and I was able to reproduce it with the following sequence: open("/tmp/upper/foo", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, 0600) = 3 link("/tmp/upper/foo", "/tmp/upper/bar") = 0 unlink("/tmp/upper/foo") = 0 open("/tmp/upper/bar", O_RDWR|O_CREAT|O_NOFOLLOW, 0600) = 4 unlink("/tmp/upper/bar") = 0 write(4, "eCryptfs test\n"..., 14 <unfinished ...> +++ killed by SIGKILL +++ https://bugs.launchpad.net/ecryptfs/+bug/387073 Reported-by: Loïc Minier <loic.minier@canonical.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: stable <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23eCryptfs: Propagate vfs_read and vfs_write return codesTyler Hicks
Errors returned from vfs_read() and vfs_write() calls to the lower filesystem were being masked as -EINVAL. This caused some confusion to users who saw EINVAL instead of ENOSPC when the disk was full, for instance. Also, the actual bytes read or written were not accessible by callers to ecryptfs_read_lower() and ecryptfs_write_lower(), which may be useful in some cases. This patch updates the error handling logic where those functions are called in order to accept positive return codes indicating success. Cc: Eric Sandeen <esandeen@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23eCryptfs: Validate global auth tok keysTyler Hicks
When searching through the global authentication tokens for a given key signature, verify that a matching key has not been revoked and has not expired. This allows the `keyctl revoke` command to be properly used on keys in use by eCryptfs. Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: stable <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23eCryptfs: Filename encryption only supports password auth tokensTyler Hicks
Returns -ENOTSUPP when attempting to use filename encryption with something other than a password authentication token, such as a private token from openssl. Using filename encryption with a userspace eCryptfs key module is a future goal. Until then, this patch handles the situation a little better than simply using a BUG_ON(). Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: stable <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23eCryptfs: Check for O_RDONLY lower inodes when opening lower filesTyler Hicks
If the lower inode is read-only, don't attempt to open the lower file read/write and don't hand off the open request to the privileged eCryptfs kthread for opening it read/write. Instead, only try an unprivileged, read-only open of the file and give up if that fails. This patch fixes an oops when eCryptfs is mounted on top of a read-only mount. Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Eric Sandeen <esandeen@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: stable <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23eCryptfs: Handle unrecognized tag 3 cipher codesTyler Hicks
Returns an error when an unrecognized cipher code is present in a tag 3 packet or an ecryptfs_crypt_stat cannot be initialized. Also sets an crypt_stat->tfm error pointer to NULL to ensure that it will not be incorrectly freed in ecryptfs_destroy_crypt_stat(). Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: stable <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23ecryptfs: improved dependency checking and reportingDave Hansen
So, I compiled a 2.6.31-rc5 kernel with ecryptfs and loaded its module. When it came time to mount my filesystem, I got this in dmesg, and it refused to mount: [93577.776637] Unable to allocate crypto cipher with name [aes]; rc = [-2] [93577.783280] Error attempting to initialize key TFM cipher with name = [aes]; rc = [-2] [93577.791183] Error attempting to initialize cipher with name = [aes] and key size = [32]; rc = [-2] [93577.800113] Error parsing options; rc = [-22] I figured from the error message that I'd either forgotten to load "aes" or that my key size was bogus. Neither one of those was the case. In fact, I was missing the CRYPTO_ECB config option and the 'ecb' module. Unfortunately, there's no trace of 'ecb' in that error message. I've done two things to fix this. First, I've modified ecryptfs's Kconfig entry to select CRYPTO_ECB and CRYPTO_CBC. I also took CRYPTO out of the dependencies since the 'select' will take care of it for us. I've also modified the error messages to print a string that should contain both 'ecb' and 'aes' in my error case. That will give any future users a chance of finding the right modules and Kconfig options. I also wonder if we should: select CRYPTO_AES if !EMBEDDED since I think most ecryptfs users are using AES like me. Cc: ecryptfs-devel@lists.launchpad.net Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> [tyhicks@linux.vnet.ibm.com: Removed extra newline, 80-char violation] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23eCryptfs: Fix lockdep-reported AB-BA mutex issueRoland Dreier
Lockdep reports the following valid-looking possible AB-BA deadlock with global_auth_tok_list_mutex and keysig_list_mutex: ecryptfs_new_file_context() -> ecryptfs_copy_mount_wide_sigs_to_inode_sigs() -> mutex_lock(&mount_crypt_stat->global_auth_tok_list_mutex); -> ecryptfs_add_keysig() -> mutex_lock(&crypt_stat->keysig_list_mutex); vs ecryptfs_generate_key_packet_set() -> mutex_lock(&crypt_stat->keysig_list_mutex); -> ecryptfs_find_global_auth_tok_for_sig() -> mutex_lock(&mount_crypt_stat->global_auth_tok_list_mutex); ie the two mutexes are taken in opposite orders in the two different code paths. I'm not sure if this is a real bug where two threads could actually hit the two paths in parallel and deadlock, but it at least makes lockdep impossible to use with ecryptfs since this report triggers every time and disables future lockdep reporting. Since ecryptfs_add_keysig() is called only from the single callsite in ecryptfs_copy_mount_wide_sigs_to_inode_sigs(), the simplest fix seems to be to move the lock of keysig_list_mutex back up outside of the where global_auth_tok_list_mutex is taken. This patch does that, and fixes the lockdep report on my system (and ecryptfs still works OK). The full output of lockdep fixed by this patch is: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.31-2-generic #14~rbd2 ------------------------------------------------------- gdm/2640 is trying to acquire lock: (&mount_crypt_stat->global_auth_tok_list_mutex){+.+.+.}, at: [<ffffffff8121591e>] ecryptfs_find_global_auth_tok_for_sig+0x2e/0x90 but task is already holding lock: (&crypt_stat->keysig_list_mutex){+.+.+.}, at: [<ffffffff81217728>] ecryptfs_generate_key_packet_set+0x58/0x2b0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&crypt_stat->keysig_list_mutex){+.+.+.}: [<ffffffff8108c897>] check_prev_add+0x2a7/0x370 [<ffffffff8108cfc1>] validate_chain+0x661/0x750 [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430 [<ffffffff8108d585>] lock_acquire+0xa5/0x150 [<ffffffff815526cd>] __mutex_lock_common+0x4d/0x3d0 [<ffffffff81552b56>] mutex_lock_nested+0x46/0x60 [<ffffffff8121526a>] ecryptfs_add_keysig+0x5a/0xb0 [<ffffffff81213299>] ecryptfs_copy_mount_wide_sigs_to_inode_sigs+0x59/0xb0 [<ffffffff81214b06>] ecryptfs_new_file_context+0xa6/0x1a0 [<ffffffff8120e42a>] ecryptfs_initialize_file+0x4a/0x140 [<ffffffff8120e54d>] ecryptfs_create+0x2d/0x60 [<ffffffff8113a7d4>] vfs_create+0xb4/0xe0 [<ffffffff8113a8c4>] __open_namei_create+0xc4/0x110 [<ffffffff8113d1c1>] do_filp_open+0xa01/0xae0 [<ffffffff8112d8d9>] do_sys_open+0x69/0x140 [<ffffffff8112d9f0>] sys_open+0x20/0x30 [<ffffffff81013132>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff -> #0 (&mount_crypt_stat->global_auth_tok_list_mutex){+.+.+.}: [<ffffffff8108c675>] check_prev_add+0x85/0x370 [<ffffffff8108cfc1>] validate_chain+0x661/0x750 [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430 [<ffffffff8108d585>] lock_acquire+0xa5/0x150 [<ffffffff815526cd>] __mutex_lock_common+0x4d/0x3d0 [<ffffffff81552b56>] mutex_lock_nested+0x46/0x60 [<ffffffff8121591e>] ecryptfs_find_global_auth_tok_for_sig+0x2e/0x90 [<ffffffff812177d5>] ecryptfs_generate_key_packet_set+0x105/0x2b0 [<ffffffff81212f49>] ecryptfs_write_headers_virt+0xc9/0x120 [<ffffffff8121306d>] ecryptfs_write_metadata+0xcd/0x200 [<ffffffff8120e44b>] ecryptfs_initialize_file+0x6b/0x140 [<ffffffff8120e54d>] ecryptfs_create+0x2d/0x60 [<ffffffff8113a7d4>] vfs_create+0xb4/0xe0 [<ffffffff8113a8c4>] __open_namei_create+0xc4/0x110 [<ffffffff8113d1c1>] do_filp_open+0xa01/0xae0 [<ffffffff8112d8d9>] do_sys_open+0x69/0x140 [<ffffffff8112d9f0>] sys_open+0x20/0x30 [<ffffffff81013132>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff other info that might help us debug this: 2 locks held by gdm/2640: #0: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8113cb8b>] do_filp_open+0x3cb/0xae0 #1: (&crypt_stat->keysig_list_mutex){+.+.+.}, at: [<ffffffff81217728>] ecryptfs_generate_key_packet_set+0x58/0x2b0 stack backtrace: Pid: 2640, comm: gdm Tainted: G C 2.6.31-2-generic #14~rbd2 Call Trace: [<ffffffff8108b988>] print_circular_bug_tail+0xa8/0xf0 [<ffffffff8108c675>] check_prev_add+0x85/0x370 [<ffffffff81094912>] ? __module_text_address+0x12/0x60 [<ffffffff8108cfc1>] validate_chain+0x661/0x750 [<ffffffff81017275>] ? print_context_stack+0x85/0x140 [<ffffffff81089c68>] ? find_usage_backwards+0x38/0x160 [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430 [<ffffffff8108d585>] lock_acquire+0xa5/0x150 [<ffffffff8121591e>] ? ecryptfs_find_global_auth_tok_for_sig+0x2e/0x90 [<ffffffff8108b0b0>] ? check_usage_backwards+0x0/0xb0 [<ffffffff815526cd>] __mutex_lock_common+0x4d/0x3d0 [<ffffffff8121591e>] ? ecryptfs_find_global_auth_tok_for_sig+0x2e/0x90 [<ffffffff8121591e>] ? ecryptfs_find_global_auth_tok_for_sig+0x2e/0x90 [<ffffffff8108c02c>] ? mark_held_locks+0x6c/0xa0 [<ffffffff81125b0d>] ? kmem_cache_alloc+0xfd/0x1a0 [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff81552b56>] mutex_lock_nested+0x46/0x60 [<ffffffff8121591e>] ecryptfs_find_global_auth_tok_for_sig+0x2e/0x90 [<ffffffff812177d5>] ecryptfs_generate_key_packet_set+0x105/0x2b0 [<ffffffff81212f49>] ecryptfs_write_headers_virt+0xc9/0x120 [<ffffffff8121306d>] ecryptfs_write_metadata+0xcd/0x200 [<ffffffff81210240>] ? ecryptfs_init_persistent_file+0x60/0xe0 [<ffffffff8120e44b>] ecryptfs_initialize_file+0x6b/0x140 [<ffffffff8120e54d>] ecryptfs_create+0x2d/0x60 [<ffffffff8113a7d4>] vfs_create+0xb4/0xe0 [<ffffffff8113a8c4>] __open_namei_create+0xc4/0x110 [<ffffffff8113d1c1>] do_filp_open+0xa01/0xae0 [<ffffffff8129a93e>] ? _raw_spin_unlock+0x5e/0xb0 [<ffffffff8155410b>] ? _spin_unlock+0x2b/0x40 [<ffffffff81139e9b>] ? getname+0x3b/0x240 [<ffffffff81148a5a>] ? alloc_fd+0xfa/0x140 [<ffffffff8112d8d9>] do_sys_open+0x69/0x140 [<ffffffff81553b8f>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8112d9f0>] sys_open+0x20/0x30 [<ffffffff81013132>] system_call_fastpath+0x16/0x1b Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23ecryptfs: Remove unneeded locking that triggers lockdep false positivesRoland Dreier
In ecryptfs_destroy_inode(), inode_info->lower_file_mutex is locked, and just after the mutex is unlocked, the code does: kmem_cache_free(ecryptfs_inode_info_cache, inode_info); This means that if another context could possibly try to take the same mutex as ecryptfs_destroy_inode(), then it could end up getting the mutex just before the data structure containing the mutex is freed. So any such use would be an obvious use-after-free bug (catchable with slab poisoning or mutex debugging), and therefore the locking in ecryptfs_destroy_inode() is not needed and can be dropped. Similarly, in ecryptfs_destroy_crypt_stat(), crypt_stat->keysig_list_mutex is locked, and then the mutex is unlocked just before the code does: memset(crypt_stat, 0, sizeof(struct ecryptfs_crypt_stat)); Therefore taking this mutex is similarly not necessary. Removing this locking fixes false-positive lockdep reports such as the following (and they are false-positives for exactly the same reason that the locking is not needed): ================================= [ INFO: inconsistent lock state ] 2.6.31-2-generic #14~rbd3 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/323 [HC0[0]:SC0[0]:HE1:SE1] takes: (&inode_info->lower_file_mutex){+.+.?.}, at: [<ffffffff81210d34>] ecryptfs_destroy_inode+0x34/0x100 {RECLAIM_FS-ON-W} state was registered at: [<ffffffff8108c02c>] mark_held_locks+0x6c/0xa0 [<ffffffff8108c10f>] lockdep_trace_alloc+0xaf/0xe0 [<ffffffff81125a51>] kmem_cache_alloc+0x41/0x1a0 [<ffffffff8113117a>] get_empty_filp+0x7a/0x1a0 [<ffffffff8112dd46>] dentry_open+0x36/0xc0 [<ffffffff8121a36c>] ecryptfs_privileged_open+0x5c/0x2e0 [<ffffffff81210283>] ecryptfs_init_persistent_file+0xa3/0xe0 [<ffffffff8120e838>] ecryptfs_lookup_and_interpose_lower+0x278/0x380 [<ffffffff8120f97a>] ecryptfs_lookup+0x12a/0x250 [<ffffffff8113930a>] real_lookup+0xea/0x160 [<ffffffff8113afc8>] do_lookup+0xb8/0xf0 [<ffffffff8113b518>] __link_path_walk+0x518/0x870 [<ffffffff8113bd9c>] path_walk+0x5c/0xc0 [<ffffffff8113be5b>] do_path_lookup+0x5b/0xa0 [<ffffffff8113bfe7>] user_path_at+0x57/0xa0 [<ffffffff811340dc>] vfs_fstatat+0x3c/0x80 [<ffffffff8113424b>] vfs_stat+0x1b/0x20 [<ffffffff81134274>] sys_newstat+0x24/0x50 [<ffffffff81013132>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 7811 hardirqs last enabled at (7811): [<ffffffff810c037f>] call_rcu+0x5f/0x90 hardirqs last disabled at (7810): [<ffffffff810c0353>] call_rcu+0x33/0x90 softirqs last enabled at (3764): [<ffffffff810631da>] __do_softirq+0x14a/0x220 softirqs last disabled at (3751): [<ffffffff8101440c>] call_softirq+0x1c/0x30 other info that might help us debug this: 2 locks held by kswapd0/323: #0: (shrinker_rwsem){++++..}, at: [<ffffffff810f67ed>] shrink_slab+0x3d/0x190 #1: (&type->s_umount_key#35){.+.+..}, at: [<ffffffff811429a1>] prune_dcache+0xd1/0x1b0 stack backtrace: Pid: 323, comm: kswapd0 Tainted: G C 2.6.31-2-generic #14~rbd3 Call Trace: [<ffffffff8108ad6c>] print_usage_bug+0x18c/0x1a0 [<ffffffff8108aff0>] ? check_usage_forwards+0x0/0xc0 [<ffffffff8108bac2>] mark_lock_irq+0xf2/0x280 [<ffffffff8108bd87>] mark_lock+0x137/0x1d0 [<ffffffff81164710>] ? fsnotify_clear_marks_by_inode+0x30/0xf0 [<ffffffff8108bee6>] mark_irqflags+0xc6/0x1a0 [<ffffffff8108d337>] __lock_acquire+0x287/0x430 [<ffffffff8108d585>] lock_acquire+0xa5/0x150 [<ffffffff81210d34>] ? ecryptfs_destroy_inode+0x34/0x100 [<ffffffff8108d2e7>] ? __lock_acquire+0x237/0x430 [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0 [<ffffffff81210d34>] ? ecryptfs_destroy_inode+0x34/0x100 [<ffffffff81164710>] ? fsnotify_clear_marks_by_inode+0x30/0xf0 [<ffffffff81210d34>] ? ecryptfs_destroy_inode+0x34/0x100 [<ffffffff8129a91e>] ? _raw_spin_unlock+0x5e/0xb0 [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60 [<ffffffff81210d34>] ecryptfs_destroy_inode+0x34/0x100 [<ffffffff81145d27>] destroy_inode+0x87/0xd0 [<ffffffff81146b4c>] generic_delete_inode+0x12c/0x1a0 [<ffffffff81145832>] iput+0x62/0x70 [<ffffffff811423c8>] dentry_iput+0x98/0x110 [<ffffffff81142550>] d_kill+0x50/0x80 [<ffffffff81142623>] prune_one_dentry+0xa3/0xc0 [<ffffffff811428b1>] __shrink_dcache_sb+0x271/0x290 [<ffffffff811429d9>] prune_dcache+0x109/0x1b0 [<ffffffff81142abf>] shrink_dcache_memory+0x3f/0x50 [<ffffffff810f68dd>] shrink_slab+0x12d/0x190 [<ffffffff810f9377>] balance_pgdat+0x4d7/0x640 [<ffffffff8104c4c0>] ? finish_task_switch+0x40/0x150 [<ffffffff810f63c0>] ? isolate_pages_global+0x0/0x60 [<ffffffff810f95f7>] kswapd+0x117/0x170 [<ffffffff810777a0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810f94e0>] ? kswapd+0x0/0x170 [<ffffffff810773be>] kthread+0x9e/0xb0 [<ffffffff8101430a>] child_rip+0xa/0x20 [<ffffffff81013c90>] ? restore_args+0x0/0x30 [<ffffffff81077320>] ? kthread+0x0/0xb0 [<ffffffff81014300>] ? child_rip+0x0/0x20 Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-09-23USB: Fix sysfs paths in documentationJean Delvare
Neither /sys/usb/devices nor /sys/bus/devices exist. The correct path is /sys/bus/usb/devices. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23USB: skeleton: fix coding style issues.Greg Kroah-Hartman
This fixes up the majority of the coding style issues in the usb-skeleton driver. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23USB: O_NONBLOCK in read path of skeletonOliver Neukum
Non blocking IO is supported in the read path of usb-skeleton. This is done by just not blocking. As support for handling signals without stopping IO is already there, it can be used for O_NONBLOCK, too. Signed-off-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23USB: make usb-skeleton honor O_NONBLOCK in write pathOliver Neukum
usb:usb-skeleton: honor O_NONBLOCK in write path nonblocking writes are allowed by using down_trylock if necessary to reserve an URB Signed-off-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23USB: skel_read really sucks royallyOliver Neukum
The read code path of the skeleton driver really sucks - skel_read works only for devices which always send data - the timeout comes out of thin air - it blocks signals for the duration of the timeout - it disallows nonblocking IO by design This patch fixes it by using a real urb, a completion and interruptible waits. Signed-off-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23USB: Add hub descriptor update hook for xHCISarah Sharp
Add a hook for updating xHCI internal structures after khubd fetches the hub descriptor and sets up the hub's TT information. The xHCI driver must update the internal structures before devices under the hub can be enumerated. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23USB: xhci: Support USB hubs.Sarah Sharp
For a USB hub to work under an xHCI host controller, the xHC's internal scheduler must be made aware of the hub's characteristics. Add an xHCI hook that the USB core will call after it fetches the hub descriptor. This hook will add hub information to the slot context for that device, including whether it has multiple TTs or a single TT, the number of ports on the hub, and TT think time. Setting up the slot context for the device is different for 0.95 and 0.96 xHCI host controllers. Some of the slot context reserved fields in the 0.95 specification were changed into hub fields in the 0.96 specification. Don't set the TT think time or number of ports for a hub if we're dealing with a 0.95-compliant xHCI host controller. The 0.95 xHCI specification says that to modify the hub flag, we need to issue an evaluate context command. The 0.96 specification says that flag can be set with a configure endpoint command. Issue the correct command based on the version reported by the hardware. This patch does not add support for multi-TT hubs. Multi-TT hubs expose a single TT on alt setting 0, and multi-TT on alt setting 1. The xHCI driver can't handle setting alternate interfaces yet. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>