summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-04-26r8169: more broken register writes workaroundfrançois romieu
78f1cd02457252e1ffbc6caa44a17424a45286b8 ("fix broken register writes") does not work for Al Viro's r8169 (XID 18000000). Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26r8169: failure to enable mwi should not be fatalfrançois romieu
Few (6) network drivers enable mwi explicitly. Fewer worry about a failure. It is not a fix but it should avoid some annoyance like http://bugzilla.kernel.org/show_bug.cgi?id=15454 Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Conrad Kostecki <conikost@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26ptrace: Cleanup useless headerAlessio Igor Bogani
BKL isn't present anymore into this file thus we can safely remove smp_lock.h inclusion. Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Morris <jmorris@namei.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-04-26tracing: Have graph flags passed in to ouput functionsJiri Olsa
Let the function graph tracer have custom flags passed to its output functions. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1270227683-14631-3-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-26tracing: Add ftrace events for graph tracerJiri Olsa
Add ftrace events for graph tracer, so the graph output could be shared with other tracers. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1270227683-14631-2-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-26nfsd4: bug in read_bufNeil Brown
When read_buf is called to move over to the next page in the pagelist of an NFSv4 request, it sets argp->end to essentially a random number, certainly not an address within the page which argp->p now points to. So subsequent calls to READ_BUF will think there is much more than a page of spare space (the cast to u32 ensures an unsigned comparison) so we can expect to fall off the end of the second page. We never encountered thsi in testing because typically the only operations which use more than two pages are write-like operations, which have their own decoding logic. Something like a getattr after a write may cross a page boundary, but it would be very unusual for it to cross another boundary after that. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-26x86/PCI: never allocate PCI MMIO resources below BIOS_ENDBjorn Helgaas
When we move a PCI device or assign resources to a device not configured by the BIOS, we want to avoid the BIOS region below 1MB. Note that if the BIOS places devices below 1MB, we leave them there. See https://bugzilla.kernel.org/show_bug.cgi?id=15744 and https://bugzilla.kernel.org/show_bug.cgi?id=15841 Tested-by: Andy Isaacson <adi@hexapodia.org> Tested-by: Andy Bailey <bailey@akamai.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-26cifs: change && to ||Dan Carpenter
This is a typo, if pvolume_info were NULL it would oops. This function is used in clean up and error handling. The current code never passes a NULL pvolume_info, but it could pass a NULL *pvolume_info if the kmalloc() failed. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-26cifs: rename "extended_security" to "global_secflags"Jeff Layton
...since that more accurately describes what that variable holds. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-26cifs: move tcon find/create into separate functionJeff Layton
...and out of cifs_mount. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-26cifs: move SMB session creation code into separate functionJeff Layton
...it's mostly part of cifs_mount. Break it out into a separate function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-26cifs: track local_nls in volume infoJeff Layton
Add a local_nls field to the smb_vol struct and keep a pointer to the local_nls in it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-26perf tools: Fix libdw-dev package name in error messageStefan Hajnoczi
The headers required for DWARF support are provided by the libdw-dev package in Debian-based distros. This patch corrects the elfutils-dev package name to libdw-dev in the Makefile error message when libdw.h is not found. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <1272292023-9869-1-git-send-email-stefanha@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-26perf probe: Add --max-probes optionMasami Hiramatsu
Add --max-probes option to change the maximum limit of findable probe points per event, since inlined function can be expanded into thousands of probe points. Default value is 128. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20100421195640.24664.62984.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-26perf probe: Fix to exit callback soon after finding too many probe pointsMasami Hiramatsu
Fix to exit callback soon after finding too many probe points. Don't try to continue searching because it already failed. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20100421195632.24664.42598.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-26perf probe: Fix to use symtab only if no debuginfoMasami Hiramatsu
Fix perf probe to use symtab only if there is no debuginfo, because debuginfo has more information than symtab. If we can't find a function in debuginfo, we never find it in symtab. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20100421195624.24664.46214.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-26perf tools: Initialize dso->node member in dso__newMasami Hiramatsu
If dso->node member is not initialized, it causes a segmentation fault when adding to other lists. It should be initilized in dso__new(). Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: : <20100421195616.24664.89980.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-26bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.YOSHIFUJI Hideaki / 吉藤英明
Even with commit 32dec5dd0233ebffa9cae25ce7ba6daeb7df4467 ("bridge br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately initialized if IGMP snooping support is compiled and disabled, so we can see garbage. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26watchdog: sbc_fitpc2_wdt: fixed "scheduling while atomic" bug.Denis Turischev
spinlock need to be replaced by mutex because of sleep functions inside wdt_send_data. Signed-off-by: Denis Turischev <denis@compulab.co.il> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2010-04-26ieee802154: Fix oops during ieee802154_sock_ioctlStefan Schmidt
Trying to run izlisten (from lowpan-tools tests) on a device that does not exists I got the oops below. The problem is that we are using get_dev_by_name without checking if we really get a device back. We don't in this case and writing to dev->type generates this oops. [Oops code removed by Dmitry Eremin-Solenikov] If possible this patch should be applied to the current -rc fixes branch. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26p54pci: fix bugs in p54p_check_tx_ringHans de Goede
Hans de Goede identified a bug in p54p_check_tx_ring: there are two ring indices. 1 => tx data and 3 => tx management. But the old code had a constant "1" and this resulted in spurious dma unmapping failures. Cc: stable@kernel.org Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=583623 Bug-Identified-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-26watchdog: sbc_fitpc2_wdt: fixed I/O operations orderDenis Turischev
There are fitpc2 compatible boards that hang with existent i/o operations order. Solution is to switch between writing to data and command ports. Signed-off-by: Denis Turischev <denis@compulab.co.il> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2010-04-26tg3: Fix INTx fallback when MSI failsAndre Detsch
tg3: Fix INTx fallback when MSI fails MSI setup changes the value of irq_vec in struct tg3 *tp. This attribute must be taken into account and restored before we try to do a new request_irq for INTx fallback. In powerpc, the original code was leading to an EINVAL return within request_irq, because the driver was trying to use the disabled MSI virtual irq number instead of tp->pdev->irq. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26Watchdog: sb_wdog.c: Fix sibyte watchdog initializationGuenter Roeck
Watchdog configuration register and timer count register were interchanged, causing wrong values to be written into both registers. This caused watchdog triggered resets even if the watchdog was reset in time. Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2010-04-26pcmcia: fix matching rules for pseudo-multi-function cardsAlexander Kurz
Prevent PCMCIA_DEV_ID_MATCH_FUNC_ID from grabbing PFC-cards: I changed the code, so that the first matching struct pcmcia_device_id _PFC_ entry will mark the card has_pfc, preventing PCMCIA_DEV_ID_MATCH_FUNC_ID to match. [linux-pcmcia@lists.infradead.org: re-order commit message] Signed-off-by: Alexander Kurz <linux@kbdbabel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2010-04-26xfs: more swap extent fixes for dynamic fork offsetsDave Chinner
A new xfsqa test (226) with a prototype xfs_fsr change to try to handle dynamic fork offsets better triggers an assertion failure where the inode data fork is in btree format, yet there is room in the inode for it to be in extent format. The two inodes look like: before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96 before: ino 0x115 (temp), num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56 after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96 after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56 Basically the target inode ends up with 5 extents in btree format, but it had space for 6 extents in extent format, so ends up incorrect. Notably here the broot size is the same, and that is where the kernel code is going wrong - the btree root will fit, so it lets the swap go ahead. The check should not allow the swap to take place if the number of extents while in btree format is less than the number of extents that can fit in the inode in extent format. Adding that check will prevent this swap and corruption from occurring. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-04-26btrfs: convert to using bdi_setup_and_register()Jens Axboe
It's now a provided helper, so get rid of the internal setup and btrfs atomic_t bdi enumerator. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-26ALSA: es968: fix wrong PnP dma indexKrzysztof Helt
There is only one dma for the ESS ES968 based board. Its index is 0 and not 1. This make the es968 card working. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-04-26SH: fix error paths in DMA driverGuennadi Liakhovetski
If channel allocation is failing, mark the channel unused and give PM a chance to power down the hardware. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-04-26sh: sh7751 pci controller io port fixMagnus Damm
This patch updates the sh7751 pci code to handle io ports correctly. The code is based on the sh7788x implementation. Tested on a R2D-1 board with CONFIG_8139TOO_PIO=y. Signed-off-by: Magnus Damm <damm@opensource.se> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-04-26sh: Fix maximum number of SCIF ports in R2D defconfigsMagnus Damm
Update the R2D defconfigs to bump up the maximum number of SCIF ports on the system. Fixes a broken serial console regression added by cd5f107628ab89c5dec5ad923f1c27f4cba41972. Reported-by: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp> Signed-off-by: Magnus Damm <damm@opensource.se> Tested-by: Alexandre Courbot <alex@dcl.info.waseda.ac.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-04-26SH: fix TS field shift calculation for DMA driversGuennadi Liakhovetski
CHCR_TS_HIGH_SHIFT is defined as a shift of TS high bits in CHCR register, relative to low bits. The TS_INDEX2VAL() macro has to take this into account. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-04-26crypto: authenc - Add EINPROGRESS checkHerbert Xu
When Steffen originally wrote the authenc async hash patch, he correctly had EINPROGRESS checks in place so that we did not invoke the original completion handler with it. Unfortuantely I told him to remove it before the patch was applied. As only MAY_BACKLOG request completion handlers are required to handle EINPROGRESS completions, those checks are really needed. This patch restores them. Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipv6: Fix inet6_csk_bind_conflict() e100: Fix the TX workqueue race
2010-04-25ipv6: Fix inet6_csk_bind_conflict()Eric Dumazet
Commit fda48a0d7a84 (tcp: bind() fix when many ports are bound) introduced a bug on IPV6 part. We should not call ipv6_addr_any(inet6_rcv_saddr(sk2)) but ipv6_addr_any(inet6_rcv_saddr(sk)) because sk2 can be IPV4, while sk is IPV6. Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-25Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Issue the discard operation *before* releasing the blocks to be reused ext4: Fix buffer head leaks after calls to ext4_get_inode_loc() ext4: Fix possible lost inode write in no journal mode
2010-04-25Catch filesystems lacking s_bdiJörn Engel
noop_backing_dev_info is used only as a flag to mark filesystems that don't have any backing store, like tmpfs, procfs, spufs, etc. Signed-off-by: Joern Engel <joern@logfs.org> Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes to the noop_backing_dev_info is not legal and will not result in them being flushed, but we already catch this condition in __mark_inode_dirty() when checking for a registered bdi. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-24e100: Fix the TX workqueue raceAlan Cox
Nothing stops the workqueue being left to run in parallel with close or a few other operations. This causes double unmaps and the like. See kerneloops.org #1041230 for an example Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-25squashfs: fix potential buffer over-run on 4K block file systemsPhillip Lougher
Sizing the buffer based on block size is incorrect, leading to a potential buffer over-run on 4K block size file systems (because the metadata block size is always 8K). This bug doesn't seem have triggered because 4K block size file systems are not default, and also because metadata blocks after compression tend to be less than 4K. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-04-25squashfs: add missing buffer freePhillip Lougher
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-04-25squashfs: fix warn_on when root inode is corruptedPhillip Lougher
Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where the root inode has been corrupted. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Reported-by: Steve Grubb <sgrubb@redhat.com>
2010-04-24Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] use max load in conservative governor [CPUFREQ] fix a lockdep warning
2010-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) gianfar: Fix potential oops during OF address translation fsl_pq_mdio: Fix kernel oops during OF address translation tcp: bind() fix when many ports are bound rdma: potential ERR_PTR dereference rtnetlink: potential ERR_PTR dereference net: ipv6 bind to device issue ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU drivers/net/usb: Add new driver ipheth cxgb3: fix linkup issue X25 fix dead unaccepted sockets KS8851: NULL pointer dereference if list is empty net: 3c574_cs fix stats.tx_bytes counter xfrm6: ensure to use the same dev when building a bundle can: Fix possible NULL pointer dereference in ems_usb.c net: Fix an RCU warning in dev_pick_tx() ipv6: Fix tcp_v6_send_response transport header setting. bridge: add a missing ntohs() 8139too: Fix a typo in the function name. mac80211: pass HT changes to driver when off channel mac80211: remove bogus TX agg state assignment ...
2010-04-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: Ensure we re-enable devices on resume x86/PCI: parse additional host bridge window resource types PCI: revert broken device warning PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
2010-04-24Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: eeepc-laptop: add missing sparse_keymap_free eeepc-wmi: Build fix asus: don't modify bluetooth/wlan on boot dell-wmi: Fix memory leak eeepc-wmi: add backlight support eeepc-wmi: use a platform device as parent device of all sub-devices eeepc-wmi: add an eeepc_wmi context structure
2010-04-24initramfs: handle unrecognised decompressor when unpackingPhillip Lougher
The unpack routine fails to handle the decompress_method() returning unrecognised decompressor (compress_name == NULL). This results in the routine looping eventually oopsing on an out of bounds memory access. Note this bug is usually hidden, only triggering on trailing junk after one or more correct compressed blocks. The case of the compressed archive being complete junk is (by accident?) caught by the if (state != Reset) check because state is initialised to Start, but not updated due to the decompressor not having been called. Obviously if the junk is trailing a correctly decompressed buffer, state == Reset from the previous call to the decompressor. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24ksm: check for ERR_PTR from follow_page()Dan Carpenter
The follow_page() function can potentially return -EFAULT so I added checks for this. Also I silenced an uninitialized variable warning on my version of gcc (version 4.3.2). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24VMware Balloon driverDmitry Torokhov
This is a standalone version of VMware Balloon driver. Ballooning is a technique that allows hypervisor dynamically limit the amount of memory available to the guest (with guest cooperation). In the overcommit scenario, when hypervisor set detects that it needs to shuffle some memory, it instructs the driver to allocate certain number of pages, and the underlying memory gets returned to the hypervisor. Later hypervisor may return memory to the guest by reattaching memory to the pageframes and instructing the driver to "deflate" balloon. We are submitting a standalone driver because KVM maintainer (Avi Kivity) expressed opinion (rightly) that our transport does not fit well into virtqueue paradigm and thus it does not make much sense to integrate with virtio. There were also some concerns whether current ballooning technique is the right thing. If there appears a better framework to achieve this we are prepared to evaluate and switch to using it, but in the meantime we'd like to get this driver upstream. We want to get the driver accepted in distributions so that users do not have to deal with an out-of-tree module and many distributions have "upstream first" requirement. The driver has been shipping for a number of years and users running on VMware platform will have it installed as part of VMware Tools even if it will not come from a distribution, thus there should not be additional risk in pulling the driver into mainline. The driver will only activate if host is VMware so everyone else should not be affected at all. Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Cc: Avi Kivity <avi@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to ↵Anton Blanchard
block devices We are seeing a large regression in database performance on recent kernels. The database opens a block device with O_DIRECT|O_SYNC and a number of threads write to different regions of the file at the same time. A simple test case is below. I haven't defined DEVICE since getting it wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we see about 17MB/sec and only a few threads in IO wait: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 3 0 16170 656 2259 0 0 86 14 0 0 2 0 16704 695 2408 0 0 92 8 0 0 2 0 17308 744 2653 0 0 86 14 0 0 2 0 17933 759 2777 0 0 89 10 0 Most threads are blocking in vfs_fsync_range, which has: mutex_lock(&mapping->host->i_mutex); err = fop->fsync(file, dentry, datasync); if (!ret) ret = err; mutex_unlock(&mapping->host->i_mutex); commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers some explanation of what is going on: Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. Thanks Jan for such a good commit message! As well as dropping i_mutex, Christoph suggests we should remove the call to sync_blockdev(): > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on > the block device inode, which is exactly what we did just before calling > into ->fsync The patch below incorporates both suggestions. With it the testcase improves from 17MB/s to 68M/sec: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 7 0 65536 1000 3878 0 0 70 30 0 0 34 0 69632 1016 3921 0 1 46 53 0 0 57 0 69632 1000 3921 0 0 55 45 0 0 53 0 69640 754 4111 0 0 81 19 0 Testcase: #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define NR_THREADS 64 #define BUFSIZE (64 * 1024) #define DEVICE "/dev/mapper/XXXXXX" #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1)) static int fd; static void *doit(void *arg) { unsigned long offset = (long)arg; char *b, *buf; b = malloc(BUFSIZE + 1024); buf = (char *)ALIGN((unsigned long)b, 1024); memset(buf, 0, BUFSIZE); while (1) pwrite(fd, buf, BUFSIZE, offset); } int main(int argc, char *argv[]) { int flags = O_RDWR|O_DIRECT; int i; unsigned long offset = 0; if (argc > 1 && !strcmp(argv[1], "O_SYNC")) flags |= O_SYNC; fd = open(DEVICE, flags); if (fd == -1) { perror("open"); exit(1); } for (i = 0; i < NR_THREADS-1; i++) { pthread_t tid; pthread_create(&tid, NULL, doit, (void *)offset); offset += BUFSIZE; } doit((void *)offset); return 0; } Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24lib/vsprintf.c: add missing EXPORT_SYMBOL(simple_strtoll)Hans Verkuil
Add a missing EXPORT_SYMBOL. I must be the first person that wants to use this function :-) Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>