summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-10-29vmscan: order evictable rescue in LRU putbackJohannes Weiner
Isolators putting a page back to the LRU do not hold the page lock, and if the page is mlocked, another thread might munlock it concurrently. Expecting this, the putback code re-checks the evictability of a page when it just moved it to the unevictable list in order to correct its decision. The problem, however, is that ordering is not garuanteed between setting PG_lru when moving the page to the list and checking PG_mlocked afterwards: #0: #1 spin_lock() if (TestClearPageMlocked()) if (PageLRU()) move to evictable list SetPageLRU() spin_unlock() if (!PageMlocked()) move to evictable list The PageMlocked() check may get reordered before SetPageLRU() in #0, resulting in #0 not moving the still mlocked page, and in #1 failing to isolate and move the page as well. The page is now stranded on the unevictable list. The race condition is very unlikely. The consequence currently is one page falling off the reclaim grid and eventually getting freed with PG_unevictable set, which triggers a warning in the page allocator. TestClearPageMlocked() in #1 already provides full memory barrier semantics. This patch adds an explicit full barrier to force ordering between SetPageLRU() and PageMlocked() so that either one of the competitors rescues the page. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29do_mbind(): fix memory leakKOSAKI Motohiro
If migrate_prep is failed, new variable is leaked. This patch fixes it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29mbind(): fix leak of never putback pagesKOSAKI Motohiro
If mbind() receives an invalid address, do_mbind leaks a page. The following test program detects this leak. This patch fixes it. migrate_efault.c ======================================= #include <numaif.h> #include <numa.h> #include <sys/mman.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> static unsigned long pagesize; static void* make_hole_mapping(void) { void* addr; addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, 0, 0); if (addr == MAP_FAILED) return NULL; /* make page populate */ memset(addr, 0, pagesize*3); /* make memory hole */ munmap(addr+pagesize, pagesize); return addr; } int main(int argc, char** argv) { void* addr; int ch; int node; struct bitmask *nmask = numa_allocate_nodemask(); int err; int node_set = 0; while ((ch = getopt(argc, argv, "n:")) != -1){ switch (ch){ case 'n': node = strtol(optarg, NULL, 0); numa_bitmask_setbit(nmask, node); node_set = 1; break; default: ; } } argc -= optind; argv += optind; if (!node_set) numa_bitmask_setbit(nmask, 0); pagesize = getpagesize(); addr = make_hole_mapping(); err = mbind(addr, pagesize*3, MPOL_BIND, nmask->maskp, nmask->size, MPOL_MF_MOVE_ALL); if (err) perror("mbind "); return 0; } ======================================= Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29hfs: fix oops on mount with corrupted btree extent recordsJeff Mahoney
A particular fsfuzzer run caused an hfs file system to crash on mount. This is due to a corrupted MDB extent record causing a miscalculation of HFS_I(inode)->first_blocks for the extent tree. If the extent records are zereod out, it won't trigger the first_blocks special case. Instead it falls through to the extent code which we're still in the middle of initializing. This patch catches the 0 size extent records, reports the corruption, and fails the mount. Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29loop: fix NULL dereference if mount failsAlexey Dobriyan
Commit bb21488482bd36eae6b30b014d93619063773fd4 ("[PATCH] switch loop") started to pass NULL bdev to ioctl hook. Steps to reproduce: [boot with loop.max_part=1] [mount -o loop something so mount fails] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8 IP: [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:35/ACPI0003:00/power_supply/ACAD/online CPU 0 Modules linked in: zfs nvidia(P) [last unloaded: zfs] Pid: 15177, comm: mount Tainted: P 2.6.32-rc4-zfs #2 Satellite X200 RIP: 0010:[<ffffffff811486ee>] [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30 RSP: 0018:ffff88003b3d5bb8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 000000000000125f RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88003b3d5ce8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00007ffffffff000 R13: 0000000000000000 R14: ffff880071cef280 R15: 00000000000200da FS: 00007fd77cfe7740(0000) GS:ffff880001600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000000b8 CR3: 0000000001001000 CR4: 00000000000026f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount (pid: 15177, threadinfo ffff88003b3d4000, task ffff88007572f920) Stack: ffff88003b3d5c38 ffffffff812f95f5 ffff88007eeb6600 0000000000000000 <0> 0000000000000000 ffff88003b3d5c18 ffffffff811547d9 ffff88001bf11ef0 <0> 7fffffffffffffff ffff88001bf11ee8 ffff88001bf11ef0 0000000000000000 Call Trace: [<ffffffff812f95f5>] ? schedule_timeout+0x1f5/0x250 [<ffffffff811547d9>] ? rb_insert_color+0x109/0x140 [<ffffffff812fb754>] ? _spin_unlock_irq+0x14/0x40 [<ffffffff812f84c6>] ? wait_for_common+0x66/0x170 [<ffffffff8105a280>] ? default_wake_function+0x0/0x10 [<ffffffff810f8258>] ioctl_by_bdev+0x38/0x50 [<ffffffff811d2481>] loop_clr_fd+0x1e1/0x210 [<ffffffff811d2522>] lo_release+0x72/0x80 [<ffffffff810f934c>] __blkdev_put+0x1ac/0x1d0 [<ffffffff810f937b>] blkdev_put+0xb/0x10 [<ffffffff810f93b9>] blkdev_close+0x39/0x60 [<ffffffff810ccef3>] __fput+0xd3/0x230 [<ffffffff810cd06d>] fput+0x1d/0x30 [<ffffffff810c9680>] filp_close+0x50/0x80 [<ffffffff81061f11>] put_files_struct+0x81/0x100 [<ffffffff81061fde>] exit_files+0x4e/0x60 [<ffffffff81063ec5>] do_exit+0x6b5/0x730 [<ffffffff8107b279>] ? up_read+0x9/0x10 [<ffffffff8104c86e>] ? do_page_fault+0x18e/0x2a0 [<ffffffff81063f81>] do_group_exit+0x41/0xc0 [<ffffffff81064012>] sys_exit_group+0x12/0x20 [<ffffffff81030deb>] system_call_fastpath+0x16/0x1b Code: f8 48 89 e5 48 81 ec 30 01 00 00 48 89 5d d8 4c 89 6d e8 4c 89 65 e0 4c 89 75 f0 4c 89 7d f8 48 89 bd e8 fe ff ff 49 89 cd 89 f3 <49> 8b 88 b8 00 00 00 81 fa 68 12 00 00 0f 84 57 05 00 00 0f 86 RIP [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30 RSP <ffff88003b3d5bb8> CR2: 00000000000000b8 ---[ end trace c0b4d3c3118d1427 ]--- Fixing recursive fault but reboot is needed! Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29vmscan: limit VM_EXEC protection to file pagesWu Fengguang
It is possible to have !Anon but SwapBacked pages, and some apps could create huge number of such pages with MAP_SHARED|MAP_ANONYMOUS. These pages go into the ANON lru list, and hence shall not be protected: we only care mapped executable files. Failing to do so may trigger OOM. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29revert "mm: oom analysis: add buffer cache information to show_free_areas()"Andrew Morton
Revert commit 71de1ccbe1fb40203edd3beb473f8580d917d2ca Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> AuthorDate: Mon Sep 21 17:01:31 2009 -0700 Commit: Linus Torvalds <torvalds@linux-foundation.org> CommitDate: Tue Sep 22 07:17:27 2009 -0700 mm: oom analysis: add buffer cache information to show_free_areas() show_free_areas() is called during page allocation failures, and page allocation failures can occur in any calling context. But nr_blockdev_pages() takes VFS locks which should not be taken from hard IRQ context (at least). The result is lockdep warnings (and deadlockability) during page allocation failures. Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29hfsplus: refuse to mount volumes larger than 2TBBen Hutchings
As found in <http://bugs.debian.org/550010>, hfsplus is using type u32 rather than sector_t for some sector number calculations. In particular, hfsplus_get_block() does: u32 ablock, dblock, mask; ... map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask)); I am not confident that I can find and fix all cases where a sector number may be truncated. For now, avoid data loss by refusing to mount HFS+ volumes with more than 2^32 sectors (2TB). [akpm@linux-foundation.org: fix 32 and 64-bit issues] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Eric Sesterhenn <snakebyte@gmx.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: rt2x00 list is moderatedBartlomiej Zolnierkiewicz
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: add Open Firmware / Flattened Device Tree entryGrant Likely
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: document new "K:" entry typeJoe Perches
K: is for keyword. Syntax is perl extended regex. Reorganized header documentation and indent the section entry descriptions so that the first K: would not be considered a regex to match by get_maintainer.pl Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29scripts/get_maintainer.pl: add patch/file search for keywordsJoe Perches
Based on an idea from Wolfram Sang. Add search for MAINTAINERS line "K:" regex pattern match in a patch or file Matches are added after file pattern matches Add --keywords command line switch (default 1, on) Change version to 0.21 Signed-off-by: Joe Perches <joe@perches.com> Cc: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: update WOLFSON MICROELECTRONICSJoe Perches
Integrate P:/M: lines Remove L: linux-kernel@vger.kernel.org Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: fix up PERIPHERAL spellingJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: WINBOND CIR - Integrate P:/M: lines, fixup David Härdeman's nameJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Cc: David Härdeman <david@hardeman.nu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: SIMPLE FIRMWARE INTERFACE: update email styleJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: update SCORE architecture name style and add file patternJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: update Kernel Janitors after mismergeJoe Perches
Fix the mismerge of the W: URL and the S: status fields. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: use tab not spaces after field typesJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: change ATM mailing list to moderatedJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Cc: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: update OMAP Tony Lindgren email nameJoe Perches
Which had an embedded and duplicated email address Signed-off-by: Joe Perches <joe@perches.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: update TRACING sectionJoe Perches
Move to alphabetic position Use single line F: entries Signed-off-by: Joe Perches <joe@perches.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29MAINTAINERS: update GENERIC UIO FOR PCI DEVICESJoe Perches
Quote a name with a period remove L: linux-kernel@vger.kernel.org Signed-off-by: Joe Perches <joe@perches.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29omap_hsmmc: add missing probe handler hookRoger Quadros
The missing probe handler hook will never probe the driver. Add it back. Fixes broken MMC on OMAP. We use platform_driver_probe() API since omap_hsmmc is not a hot-pluggable device. Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com> Tested-by: Felipe Contreras <felipe.contreras@gmail.com> Tested-by: Tony Lindgren <tony@atomide.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Felipe Contreras <felipe.contreras@gmail.com> Cc: Denis Karpov <ext-denis.2.karpov@nokia.com> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29strstrip(): mark as as must_checkKOSAKI Motohiro
strstrip() can return a modified value of its input argument, when removing elading whitesapce. So it is surely bug for this function's return value to be ignored. The caller is probably going to use the incorrect original pointer. So mark it __must_check to prevent this frm happening (as it has before). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29cgroup: fix strstrip() misuseKOSAKI Motohiro
cgroup_write_X64() and cgroup_write_string() ignore the return value of strstrip(). it makes small inconsistent behavior. example: ========================= # cd /mnt/cgroup/hoge # cat memory.swappiness 60 # echo "59 " > memory.swappiness # cat memory.swappiness 59 # echo " 58" > memory.swappiness bash: echo: write error: Invalid argument This patch fixes it. Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29congestion_wait(): don't use WRITEKOSAKI Motohiro
commit 8aa7e847d (Fix congestion_wait() sync/async vs read/write confusion) replace WRITE with BLK_RW_ASYNC. Unfortunately, concurrent mm development made the unchanged place accidentally. This patch fixes it too. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29connector: fix regression introduced by sid connectorChristian Borntraeger
Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add event for process becoming session leader) we have the following warning: Badness at kernel/softirq.c:143 [...] Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0) [...] Call Trace: ([<000000013fe04100>] 0x13fe04100) [<000000000048a946>] sk_filter+0x9a/0xd0 [<000000000049d938>] netlink_broadcast+0x2c0/0x53c [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0 [<00000000003baef0>] proc_sid_connector+0xc4/0xd4 [<0000000000142604>] __set_special_pids+0x58/0x90 [<0000000000159938>] sys_setsid+0xb4/0xd8 [<00000000001187fe>] sysc_noemu+0x10/0x16 [<00000041616cb266>] 0x41616cb266 The warning is ---> WARN_ON_ONCE(in_irq() || irqs_disabled()); The network code must not be called with disabled interrupts but sys_setsid holds the tasklist_lock with spinlock_irq while calling the connector. After a discussion we agreed that we can move proc_sid_connector from __set_special_pids to sys_setsid. We also agreed that it is sufficient to change the check from task_session(curr) != pid into err > 0, since if we don't change the session, this means we were already the leader and return -EPERM. One last thing: There is also daemonize(), and some people might want to get a notification in that case. Since daemonize() is only needed if a user space does kernel_thread this does not look important (and there seems to be no consensus if this connector should be called in daemonize). If we really want this, we can add proc_sid_connector to daemonize() in an additional patch (Scott?) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Scott James Remnant <scott@ubuntu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29hwpoison: fix/proc/meminfo alignmentHugh Dickins
Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted line is being shown too far right (it does align with x86_64's VmallocChunk above, but I hope nobody will ever have that much corrupted!). Align it. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29hwpoison: fix oops on ksm pagesHugh Dickins
Memory failure on a KSM page currently oopses on its NULL anon_vma in page_lock_anon_vma(): that may not be much worse than the consequence of ignoring it, but it is better to be consistent with how ZERO_PAGE and hugetlb pages and other awkward cases are treated. Just skip it. We could fix it for 2.6.32 at the KSM end, by putting a dummy anon_vma pointer in there; but that would get harder next time, when KSM will put a pointer to something else there (and I'm not currently planning to do any work to open that up to memory_failure). So I would prefer this simple PageKsm test, until the other exceptions are handled. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29cpufreq: add cpufreq_get() stub for CONFIG_CPU_FREQ=nRandy Dunlap
When CONFIG_CPU_FREQ is disabled, cpufreq_get() needs a stub. Used by kvm (although it looks like a bit of the kvm code could be omitted when CONFIG_CPU_FREQ is disabled). arch/x86/built-in.o: In function `kvm_arch_init': (.text+0x10de7): undefined reference to `cpufreq_get' (Needed in linux-next's KVM tree, but it's correct in 2.6.32). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Eric Paris <eparis@redhat.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29[S390] smp: fix sigp sense handlingHeiko Carstens
sigp sense only returns the status of a cpu if it is non zero. If the status of the sensed cpu is all zeros condition code 0 (accpeted) is set and no status bits are returned. The current code however assumes that a status was returned and tests bits in it. This means uninitalized data is accessed with random results. Worst case is that the code that checks if cpu is offline on cpu hotplug assumes that the target cpu is offline while it is still running. This leads potentially to memory corruption since resources that are still needed by the target cpu will be freed and could be resused while still in use. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] smp: fix sigp stop handlingHeiko Carstens
According to the architecture a cpu must not necessarily enter stopped state after completion of a sigp instruction with "stop" order code. So remove the BUG() statement after self sending sigp stop to avoid that it ever gets reached. Also add a sigp busy check to make sure that the order gets delivered. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] cputime: fix overflow on 31 bit systemsMartin Schwidefsky
The cputime_to_msecs / cputime_to_clock_t and cputime64_to_clock_t cause fixpoint divide exceptions if the cputime is too large. On a machine that collected 49.7 days worth of idle time reading from /proc/stat will generate oopses like this: Kernel BUG at 001b0c92 [verbose debug info unavailable] fixpoint divide exception: 0009 [#13] SMP Modules linked in: ipv6 CPU: 1 Tainted: G D 2.6.27.10 #5 Process cat (pid: 21352, task: 1fb34138, ksp: 1d2a3d98) Krnl PSW : 070c2000 801b0c92 (show_stat+0x2ca/0x68c) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 Krnl GPRS: 00000001 00001388 00000bb8 0015d2a1 00000000 00000000 000003e8 0001fd91 00000000 00000000 0000129d eecd2ff0 1cc533b9 0036f780 801b0bce 1d2a3cc0 Krnl Code: 801b0c86: f18890abf198 mvo 171(9,%r9),408(9,%r15) 801b0c8c: 98abf170 lm %r10,%r11,368(%r15) 801b0c90: 1da1 dr %r10,%r1 >801b0c92: 90abf170 stm %r10,%r11,368(%r15) 801b0c96: 98abf190 lm %r10,%r11,400(%r15) 801b0c9a: 1da1 dr %r10,%r1 801b0c9c: 90abf190 stm %r10,%r11,400(%r15) 801b0ca0: 18a3 lr %r10,%r3 Call Trace: ([<00000000001b09f4>] show_stat+0x2c/0x68c) [<000000000018dcee>] seq_read+0xb2/0x364 [<00000000001a9980>] proc_reg_read+0x68/0x98 [<00000000001705ee>] vfs_read+0x6e/0xe8 [<0000000000170732>] sys_read+0x36/0x78 [<000000000010f750>] sysc_do_restart+0x12/0x16 [<0000000077f3ad6a>] 0x77f3ad6a <4>---[ end trace 1436ea9559d3de9e ]--- Reported-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] call home: fix string length handlingHeiko Carstens
After copying uts->nodename to the static nodename array the static version isn't necessarily zero termininated, since the size of the array is one byte too short. Afterwards doing strncat(data, nodename, strlen(nodename)); may copy an arbitrary large amount of bytes. Fix this by getting rid of the static array and using strncat with proper length limit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] call home: fix error handling in init functionHeiko Carstens
Fix missing unregister_sysctl_table in case the SCLP doesn't provide the requested feature. Also simplify the whole error handling while at it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] smp: fix prefix handling of offlined cpusHeiko Carstens
Offlined cpus still have valid prefix register contents. Dumpers will store the register contents of a cpu to the location where its prefix register points to. For offlined cpus the area (lowcore) has been freed and the dumper would write the uninteresting contents of the offline cpu to a memory location which might be in use by some other component and destroy valueable information. To fix this set the prefix register of offline cpus to absolute address zero again. This prevents the current dumpers to write to random memory locations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] s/r: cmm resume fixMartin Schwidefsky
If a suspended z/VM guest has been logged off before the resume the 'SET SMSG IUCV' CP command need to be repeated to reenable sending message via SMSG. This fixes the following error: HCPMFS057I H4214002 not receiving; SMSG off Error: non-zero CP response for command 'SMSG H4214002 CMM SHRINK 5010': #57 Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29[S390] call home: fix local buffer usage in proc handlerSebastian Ott
Fix the size of the local buffer and use snprintf to prevent further miscalculations. Also fix the usage of bitwise vs logic operations. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-29backing-dev: ensure that a removed bdi no longer has super_block referencing itJens Axboe
When the bdi is being removed, we have to ensure that no super_blocks currently have that cached in sb->s_bdi. Normally this is ensured by the sb having a longer life span than the bdi, but if the device is suddenly yanked, we have to kill this reference. sb->s_bdi is pointed to freed memory at that point. This fixes a problem with sync(1) hanging when a USB stick is pulled without cleanly umounting it first. Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-29net: Fix 'Re: PACKET_TX_RING: packet size is too long'Gabor Gombas
Currently PACKET_TX_RING forces certain amount of every frame to remain unused. This probably originates from an early version of the PACKET_TX_RING patch that in fact used the extra space when the (since removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current code does not make any use of this extra space. This patch removes the extra space reservation and lets userspace make use of the full frame size. Signed-off-by: Gabor Gombas <gombasg@sztaki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ide: Serialize CMD643 and CMD646 to fix a hardware bug with SSDMikulas Patocka
CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is connected to one of the channels. Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3. /dev/hda is 8GB Seagate ST38410A in MWDMA2 /dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2 - When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda are corrupted, there are twiddled single bits 1->0 and some full 32-bit words corrupted, sometimes commands fail (which switches /dev/hda to PIO mode but the corruptions happen even in PIO). - Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine). - When I connected normal rotating harddisk to /dev/hdd, there was no corruption, so the corruption is something specific to SSD. - I tried the same setup on a PCI card with CMD649 and saw no corruption. This patch serializes the operation for CMD646 and 643 (I didn't test CMD643 but it may have the same hw bug too because it's earlier design). CMD649 is good. I don't know anything about CMD 648. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29netdev: usb: dm9601.c can drive a device not supported yet, add support for itJanusz Krzysztofik
I found that the current version of drivers/net/usb/dm9601.c can be used to successfully drive a low-power, low-cost network adapter with USB ID 0a46:9000, based on a DM9000E chipset. As no device with this ID is yet present in the kernel, I have created a patch that adds support for the device to the dm9601 driver. Created and tested against linux-2.6.32-rc5. Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29qlge: Fix firmware mailbox command timeout.Ron Mercer
The mailbox command process would only process a maximum of 5 unrelated firmware events while waiting for it's command completion status. It should process an unlimited number of events while waiting for a maximum of 5 seconds. Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29qlge: Fix EEH handling.Ron Mercer
Clean up driver resources without touch the hardware. Add pci save/restore state. Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)Neil Horman
Augment raw_send_hdrinc to correct for incorrect ip header length values A series of oopses was reported to me recently. Apparently when using AF_RAW sockets to send data to peers that were reachable via ipsec encapsulation, people could panic or BUG halt their systems. I've tracked the problem down to user space sending an invalid ip header over an AF_RAW socket with IP_HDRINCL set to 1. Basically what happens is that userspace sends down an ip frame that includes only the header (no data), but sets the ip header ihl value to a large number, one that is larger than the total amount of data passed to the sendmsg call. In raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr that was passed in, but assume the data is all valid. Later during ipsec encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff to provide headroom for the ipsec headers. During this operation, the skb->transport_header is repointed to a spot computed by skb->network_header + the ip header length (ihl). Since so little data was passed in relative to the value of ihl provided by the raw socket, we point transport header to an unknown location, resulting in various crashes. This fix for this is pretty straightforward, simply validate the value of of iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer size, drop the frame and return -EINVAL. I just confirmed this fixes the reported crashes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2009-10-28bonding: fix a race condition in calls to slave MII ioctlsJiri Bohac
In mii monitor mode, bond_check_dev_link() calls the the ioctl handler of slave devices. It stores the ndo_do_ioctl function pointer to a static (!) ioctl variable and later uses it to call the handler with the IOCTL macro. If another thread executes bond_check_dev_link() at the same time (even with a different bond, which none of the locks prevent), a race condition occurs. If the two racing slaves have different drivers, this may result in one driver's ioctl handler being called with a pointer to a net_device controlled with a different driver, resulting in unpredictable breakage. Unless I am overlooking something, the "static" must be a copy'n'paste error (?). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29param: fix setting arrays of boolRusty Russell
We create a dummy struct kernel_param on the stack for parsing each array element, but we didn't initialize the flags word. This matters for arrays of type "bool", where the flag indicates if it really is an array of bools or unsigned int (old-style). Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
2009-10-29param: fix NULL comparison on oomRusty Russell
kp->arg is always true: it's the contents of that pointer we care about. Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org