summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-12-10Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (21 commits) amd64_edac: bump driver version amd64_edac: fix use-uninitialised bug amd64_edac: correct sys address to chip select mapping amd64_edac: add a leaner syndrome decoding algorithm amd64_edac: remove early hw support check amd64_edac: detect DDR3 memory type edac: add memory types strings for debugging edac, mce: update AMD F10h revD check amd64_edac: remove unneeded extract_error_address wrapper amd64_edac: rename StinkyIdentifier amd64_edac: remove superfluous dbg printk amd64_edac: enhance address to DRAM bank mapping amd64_edac: cleanup f10_early_channel_count amd64_edac: dump DIMM sizes on K8 too amd64_edac: cleanup rest of amd64_dump_misc_regs amd64_edac: cleanup DRAM cfg low debug output amd64_edac: wrap-up pci config read error handling amd64_edac: unify MCGCTL ECC switching cpumask: use modern cpumask style in drivers/edac/amd64_edac.c amd64_edac: make DRAM regions output more human-readable ...
2009-12-10Merge branch 'for-linus' of git://gitorious.org/linux-omap-dss2/linuxLinus Torvalds
* 'for-linus' of git://gitorious.org/linux-omap-dss2/linux: MAINTAINERS: Add OMAP2/3 DSS and OMAPFB maintainer OMAP: SDP: Enable DSS2 for OMAP3 SDP board OMAP: DSS2: Taal DSI command mode panel driver OMAP: DSS2: Add generic and Sharp panel drivers OMAP: DSS2: omapfb driver OMAP: DSS2: DSI driver OMAP: DSS2: SDI driver OMAP: DSS2: RFBI driver OMAP: DSS2: Video encoder driver OMAP: DSS2: DPI driver OMAP: DSS2: DISPC OMAP: DSS2: Add more core files OMAP: DSS2: Display Subsystem Driver core OMAP: DSS2: Documentation for DSS2 OMAP: Add support for VRFB rotation engine OMAP: Add VRAM manager OMAP: OMAPFB: add omapdss device OMAP: OMAPFB: split omapfb.h OMAP2: Add funcs for writing SMS_ROT_* registers
2009-12-11drm/ttm: export some functions useful to drivers using ttmBen Skeggs
These are functions required by nouveau which will be merged later. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-12-11drm/radeon/kms/avivo: fix typo in new_pll module descriptionAlex Deucher
Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-12-11drm/radeon/kms: Convert radeon to new ttm_bo_initJerome Glisse
Now bo init use placement structure like bo validation does. Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-12-11drm/ttm: Convert ttm_buffer_object_init to use ttm_placementJerome Glisse
Convert ttm_buffer_object_init to use struct ttm_placement and rename to ttm_bo_init for consistency with function naming. This allow to give more complex placement at buffer creation. For instance you ask to allocate bo into vram first but if there is not enough vram you can give system as a second possible placement. It also allow to create buffer in a specific range. Also rename ttm_buffer_object_validate to ttm_bo_validate. Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-12-10net: niu uses crc32, so select CRC32Randy Dunlap
From: Randy Dunlap <randy.dunlap@oracle.com> niu drivers uses crc32 functions, so it needs to select CRC32. niu.c:(.text+0x18a7f8): undefined reference to `crc32_le' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-10wireless: update old static regulatory domain rulesJohn W. Linville
Update "US" and "JP" for current rules, and replace "EU" rules with the world roaming domain (since it was only a pseudo-domain anyway). Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10mac80211: Revert 'Use correct sign for mesh active path refresh'Javier Cardona
The patch ("mac80211: Use correct sign for mesh active path refresh.") was actually a bug. Reverted it and improved the explanation of how mesh path refresh works. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10mac80211: Fixed bug in mesh portal pathsJavier Cardona
Paths to mesh portals were being timed out immediately after each use in intermediate forwarding nodes. mppath->exp_time is set to the expiration time so assigning it to jiffies was marking the path as expired. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10net/mac80211: Correct size given to memsetJulia Lawall
Memset should be given the size of the structure, not the size of the pointer. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T *x; expression E; @@ memset(x, E, sizeof( + * x)) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10b43: Remove reset after fatal DMA errorLarry Finger
As shown in Kernel Bugzilla #14761, doing a controller restart after a fatal DMA error does not accomplish anything other than consume the CPU on an affected system. Accordingly, substitute a meaningful message for the restart. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@vger.kernel.org> [2.6.32] Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10rtl8187: add radio led and fix warnings on suspendHerton Ronaldo Krzesinski
Michael Buesch reports that his rtl8187 gives warnings on suspend ("queueing ieee80211 work while going to suspend" warnings), as rtl8187 can call ieee80211_queue_delayed_work after mac80211 is suspended. This change enhances rtl8187 led code so we can avoid queuing work after mac80211 is suspended: now we register a radio led and make additional checks to ensure led is off/on properly as mac80211 wants. Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@vger.kernel.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10ath5k: enable EEPROM checksum checkLuis R. Rodriguez
Without this we have no gaurantee of the integrity of the EEPROM and are likely to encounter a lot of bogus bug reports due to actual issues on the EEPROM. With the EEPROM checksum check in place we can easily rule those issues out. If you run patch during a revert *you* have a card with a busted EEPROM and only older kernel will support that concoction. This patch is a trade off between not accepitng bogus EEPROMs and avoiding bogus bug reports allowing developers to focus instead on real concrete issues. If stable keeps bogus bug reports because of a possibly busted EEPROM feel free to apply this there too. Tested on an AR5414 Cc: stable@kernel.org Cc: jirislaby@gmail.com Cc: akpm@linux-foundation.org Cc: rjw@sisk.pl Cc: me@bobcopeland.com Cc: david.quan@atheros.com Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Add debugobjects support
2009-12-10Merge branch 'bugfix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: try harder to balloon up under memory pressure. Xen balloon: fix totalram_pages counting. xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region. xen: improve error handling in do_suspend. xen: don't leak IRQs over suspend/resume. xen: call clock resume notifier on all CPUs xen: use iret for return from 64b kernel to 32b usermode xen: don't call dpm_resume_noirq() with interrupts disabled. xen: register runstate info for boot CPU early xen: register runstate on secondary CPUs xen: register timer interrupt with IRQF_TIMER xen: correctly restore pfn_to_mfn_list_list after resume xen: restore runstate_info even if !have_vcpu_info_placement xen: re-register runstate area earlier on resume. xen: wait up to 5 minutes for device connetion xen: improvement to wait_for_devices() xen: fix is_disconnected_device/exists_disconnected_device xen/xenbus: make DEVICE_ATTR()s static
2009-12-10Merge branch 'xen/fbdev' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'xen/fbdev' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen pvfb: Inhibit VM_IO flag to be set on vmalloc-ed framebuffers. fb-defio: Inhibit VM_IO flag to be set on vmalloc-ed framebuffers. fb-defio: If FBINFO_VIRTFB is defined, do not set VM_IO flag. Fix toogle whether xenbus driver should be built as module or part of kernel.
2009-12-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: always use GFP_NOFS
2009-12-10Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) ext4: Do not override ext2 or ext3 if built they are built as modules jbd2: Export jbd2_log_start_commit to fix ext4 build ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT ext4: Wait for proper transaction commit on fsync ext4: fix incorrect block reservation on quota transfer. ext4: quota macros cleanup ext4: ext4_get_reserved_space() must return bytes instead of blocks ext4: remove blocks from inode prealloc list on failure ext4: wait for log to commit when umounting ext4: Avoid data / filesystem corruption when write fails to copy data ext4: Use ext4 file system driver for ext2/ext3 file system mounts ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() ext4: remove unused parameter wbc from __ext4_journalled_writepage() ext4: remove encountered_congestion trace ext4: move_extent_per_page() cleanup ext4: initialize moved_len before calling ext4_move_extents() ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT ext4: use ext4_data_block_valid() in ext4_free_blocks() ...
2009-12-10Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Multi-device mirror support exofs: Move all operations to an io_engine exofs: move osd.c to ios.c exofs: statfs blocks is sectors not FS blocks exofs: Prints on mount and unmout exofs: refactor exofs_i_info initialization into common helper exofs: dbg-print less exofs: More sane debug print trivial: some small fixes in exofs documentation
2009-12-10Merge git://git.infradead.org/ubifs-2.6Linus Torvalds
* git://git.infradead.org/ubifs-2.6: UBIFS: fix return code in check_leaf UBI: flush wl before clearing update marker MAINTAINERS: change e-mail of Artem Bityutskiy UBIFS: remove manual O_SYNC handling UBIFS: support mounting of UBI volume character devices UBI: Add ubi_open_volume_path
2009-12-10V4L/DVB (13592): max2165: 32bit build patchDavid Wong
This patch drops usage of floating point variable for 32bit build Signed-off-by: David T. L. Wong <davidtlwong@gmail.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-12-10ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()Roel Kluin
Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Fix data / filesystem corruption when write fails to copy dataJan Kara
When ext3_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Reported-by: James Y Knight <foom@fuhm.net> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext4: Support for 64-bit quota formatJan Kara
Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Support for vfsv1 quota formatJan Kara
We just have to add proper mount options handling. The rest is handled by the generic quota code. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10quota: Implement quota format with 64-bit space and inode limitsJan Kara
So far the maximum quota space limit was 4TB. Apparently this isn't enough for Lustre guys anymore. So implement new quota format which raises block limits to 2^64 bytes. Also store number of inodes and inode limits in 64-bit variables as 2^32 files isn't that insanely high anymore. The first version of the patch has been developed by Andrew Perepechko <Andrew.Perepechko@Sun.COM>. CC: Andrew.Perepechko@Sun.COM Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10quota: Move definition of QFMT_OCFS2 to linux/quota.hJan Kara
Move definition of this constant to linux/quota.h so that it cannot clash with other format IDs. CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: fix comment in ext2_find_entry about return valuesJérémy Cochoy
Signed-off-by: Jérémy Cochoy <jeremy.cochoy@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Unify log messages in ext3Alexey Fisher
Make messages produced by ext3 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684964] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended dmesg after patch: [ 873.300792] EXT3-fs (loop0): using internal journaln [ 873.300796] EXT3-fs (loop0): mounted filesystem with writeback data mode [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. [ 723.755642] EXT3-fs (loop0): error: bad blocksize 8192 [ 357.874687] EXT3-fs (loop0): error: no journal found. mounting ext3 over ext2? [ 873.300764] EXT3-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: clear uptodate flag on super block I/O errorStephen Hemminger
This fixes a WARN backtrace in mark_buffer_dirty() that occurs during unmount when a USB or floppy device is removed. I reported this a kernel regression, but looks like it might have been there for longer than that. The super block update from a previous operation has marked the buffer as in error, and the flag has to be cleared before doing the update. (Similar code already exists in ext4). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: Unify log messages in ext2Alexey Fisher
make messages produced by ext2 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] dmesg after patch: [ 4893.684892] EXT2-fs (loop0): reservations ON [ 4893.684896] EXT2-fs (loop0): xip option not supported [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: make "norecovery" an alias for "noload"Eric Sandeen
Users on the list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext3. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Don't update the superblock in ext3_statfs()Eric Sandeen
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext3_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. The modifications were originally to keep the sb "more" in sync, so that a readonly fsck of the device didn't flag this as an error (as often), but apparently e2fsprogs deals with this differently now, anyway. Based on Ted's patch for ext4, which was in turn based on my work on that bug and another preliminary patch... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: journal all modifications in ext3_xattr_set_handleEric Sandeen
ext3_xattr_set_handle() was zeroing out an inode outside of journaling constraints; this is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Although ext3 doesn't have the crc issue, modifications out of journal control are a Bad Thing. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: Explicitly assign values to on-disk enum of filetypesJan Blunck
It is somewhat dangerous to use a straight enum here, because this will reassign values of later variables if one of the earlier ones is removed. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10quota: Fix WARN_ON in lookup_one_lenJan Kara
We should hold i_mutex when looking up quota files for journaled quotas, otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't hold i_mutex previously probably could not lead to a real bug since the filesystem is just being mounted / remounted read-write and thus the root directory cannot change anyway but it's definitely cleaner with i_mutex. Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10const: struct quota_format_opsAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ubifs: remove manual O_SYNC handlingChristoph Hellwig
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already covered by ubifs_fsync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10afs: remove manual O_SYNC handlingChristoph Hellwig
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate manual invocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10kill wait_on_page_writeback_rangeChristoph Hellwig
All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10vfs: Implement proper O_SYNC semanticsChristoph Hellwig
While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10zisofs: Implement reading of compressed files when PAGE_CACHE_SIZE > ↵Jan Kara
compress block size Also split and cleanup zisofs_readpage() when we are changing it anyway. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10exofs: Multi-device mirror supportBoaz Harrosh
This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB *version* field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a *device-table*. This object contains the default *data-map* and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Move all operations to an io_engineBoaz Harrosh
In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info *sbi, struct exofs_io_state** ios); Then calles one of: exofs_sbi_create(struct exofs_io_state *ios); exofs_sbi_remove(struct exofs_io_state *ios); exofs_sbi_write(struct exofs_io_state *ios); exofs_sbi_read(struct exofs_io_state *ios); exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state *ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: move osd.c to ios.cBoaz Harrosh
If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: statfs blocks is sectors not FS blocksBoaz Harrosh
Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Prints on mount and unmoutBoaz Harrosh
It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: refactor exofs_i_info initialization into common helperBoaz Harrosh
There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: dbg-print lessBoaz Harrosh
Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>