summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-12-10const: struct quota_format_opsAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ubifs: remove manual O_SYNC handlingChristoph Hellwig
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already covered by ubifs_fsync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10afs: remove manual O_SYNC handlingChristoph Hellwig
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate manual invocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10kill wait_on_page_writeback_rangeChristoph Hellwig
All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10vfs: Implement proper O_SYNC semanticsChristoph Hellwig
While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10zisofs: Implement reading of compressed files when PAGE_CACHE_SIZE > ↵Jan Kara
compress block size Also split and cleanup zisofs_readpage() when we are changing it anyway. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10exofs: Multi-device mirror supportBoaz Harrosh
This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB *version* field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a *device-table*. This object contains the default *data-map* and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Move all operations to an io_engineBoaz Harrosh
In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info *sbi, struct exofs_io_state** ios); Then calles one of: exofs_sbi_create(struct exofs_io_state *ios); exofs_sbi_remove(struct exofs_io_state *ios); exofs_sbi_write(struct exofs_io_state *ios); exofs_sbi_read(struct exofs_io_state *ios); exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state *ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: move osd.c to ios.cBoaz Harrosh
If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: statfs blocks is sectors not FS blocksBoaz Harrosh
Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Prints on mount and unmoutBoaz Harrosh
It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: refactor exofs_i_info initialization into common helperBoaz Harrosh
There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: dbg-print lessBoaz Harrosh
Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: More sane debug printBoaz Harrosh
debug prints should be somewhat useful without actually reading the source code Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...
2009-12-09ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)Theodore Ts'o
Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09ext4: Do not override ext2 or ext3 if built they are built as modulesTheodore Ts'o
The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09jbd2: Export jbd2_log_start_commit to fix ext4 buildTheodore Ts'o
This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09ceph: do not feed bad device ids to crushSage Weil
Do not feed bad (large) device ids to CRUSH. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-09Merge branch 'reiserfs/kill-bkl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: (31 commits) kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl() kill-the-bkl/reiserfs: always lock the ioctl path kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency kill-the-bkl/reiserfs: panic in case of lock imbalance kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock kill-the-bkl/reiserfs: acquire the inode mutex safely kill-the-bkl/reiserfs: unlock only when needed in search_by_key kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed() kill-the-BKL/reiserfs: add reiserfs_cond_resched() ...
2009-12-09Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Conflicts: include/linux/kvm.h
2009-12-08nfs41: Invoke RECLAIM_COMPLETE on all new client idsRicardo Labiaga
The NFSv4.1 spec indicates RECLAIM_COMPLETE is to be issued whenever a client establishes a new client id, not only after detecting the server has rebooted. Set the NFS4CLNT_RECLAIM_REBOOT bit after every new client id has been established. This enables us to issue RECLAIM_COMPLETE during the wrap up of the NFS4CLNT_RECLAIM_REBOOT state. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-08Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits) cfq-iosched: Do not access cfqq after freeing it block: include linux/err.h to use ERR_PTR cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit blkio: Allow CFQ group IO scheduling even when CFQ is a module blkio: Implement dynamic io controlling policy registration blkio: Export some symbols from blkio as its user CFQ can be a module block: Fix io_context leak after failure of clone with CLONE_IO block: Fix io_context leak after clone with CLONE_IO cfq-iosched: make nonrot check logic consistent io controller: quick fix for blk-cgroup and modular CFQ cfq-iosched: move IO controller declerations to a header file cfq-iosched: fix compile problem with !CONFIG_CGROUP blkio: Documentation blkio: Wait on sync-noidle queue even if rq_noidle = 1 blkio: Implement group_isolation tunable blkio: Determine async workload length based on total number of queues blkio: Wait for cfq queue to get backlogged if group is empty blkio: Propagate cgroup weight updation to cfq groups blkio: Drop the reference to queue once the task changes cgroup blkio: Provide some isolation between groups ...
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...
2009-12-08NFSv41: Fix a potential state leakage when restarting nfs4_close_prepareTrond Myklebust
Currently, if the call to nfs4_setup_sequence() in nfs4_close_prepare fails, any later retries will fail to launch an RPC call, due to the fact that the &state->flags will have been cleared. Ditto if nfs4_close_done() triggers a call to the NFSv4.1 version of nfs_restart_rpc(). We therefore move the actual clearing of the state->flags to nfs4_close_done(), when we know that the RPC call was successful. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-08UBIFS: fix return code in check_leafRoel Kluin
Return the PTR_ERR of the correct pointer. This fixes the debugging code. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-12-07ceph: use kref for ceph_msgSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-07ceph: use kref for ceph_osd_requestSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-07ceph: use kref for struct ceph_mds_requestSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-07ceph: simplify ceph_buffer interfaceSage Weil
We never allocate the ceph_buffer and buffer separtely, so use a single constructor. Disallow put on NULL buffer; make the caller check. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-07ceph: use kref for ceph_bufferSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-07Merge branch 'for-next' into for-linusJiri Kosina
Conflicts: kernel/irq/chip.c
2009-12-07ext4: Use slab allocator for sub-page sized allocationsTheodore Ts'o
Now that the SLUB seems to be fixed so that it respects the requested alignment, use kmem_cache_alloc() to allocator if the block size of the buffer heads to be allocated is less than the page size. Previously, we were using 16k page on a Power system for each buffer, even when the file system was using 1k or 4k block size. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-01-01ext4: Add new tracepoints to debug delayed allocation space functionsTheodore Ts'o
Add tracepoints for ext4_da_reserve_space(), ext4_da_update_reserve_space(), and ext4_da_release_space(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4: Add new tracepoint for jbd2_cleanup_journal_tailTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-01-22ext4: Add block validity check when truncating indirect block mapped inodesTheodore Ts'o
Add checks to ext4_free_branches() to make sure a block number found in an indirect block are valid before trying to free it. If a bad block number is found, stop freeing the indirect block immediately, since the file system is corrupt and we will need to run fsck anyway. This also avoids spamming the logs, and specifically avoids driver-level "attempt to access beyond end of device" errors obscure what is really going on. If you get *really*, *really*, *really* unlucky, without this patch, a supposed indirect block containing garbage might contain a reference to a primary block group descriptor, in which case ext4_free_branches() could end up zero'ing out a block group descriptor block, and if then one of the block bitmaps for a block group described by that bg descriptor block is not in memory, and is read in by ext4_read_block_bitmap(). This function calls ext4_valid_block_bitmap(), which assumes that bg_inode_table() was validated at mount time and hasn't been modified since. Since this assumption is no longer valid, it's possible for the value (ext4_inode_table(sb, desc) - group_first_block) to go negative, which will cause ext4_find_next_zero_bit() to trigger a kernel GPF. Addresses-Google-Bug: #2220436 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-15ext4: Fix optional-arg mount optionsEric Sandeen
We have 2 mount options, "barrier" and "auto_da_alloc" which may or may not take a 1/0 argument. This causes the ext4 superblock mount code to subtract uninitialized pointers and pass the result to kmalloc, which results in very noisy failures. Per Ted's suggestion, initialize the args struct so that we know whether match_token() found an argument for the option, and skip match_int() if not. Also, return error (0) from parse_options if we thought we found an argument, but match_int() Fails. Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-04ext4: fix async i/o writes beyond 4GB to a sparse fileEric Sandeen
The "offset" member in ext4_io_end holds bytes, not blocks, so ext4_lblk_t is wrong - and too small (u32). This caused the async i/o writes to sparse files beyond 4GB to fail when they wrapped around to 0. Also fix up the type of arguments to ext4_convert_unwritten_extents(), it gets ssize_t from ext4_end_aio_dio_nolock() and ext4_ext_direct_IO(). Reported-by: Giel de Nijs <giel@vectorwise.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
2009-12-07nfs41: Handle NFSv4.1 session errors in the delegation recall codeRicardo Labiaga
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-07nfs41: Retry delegation return if it failed with session errorRicardo Labiaga
Update nfs4_delegreturn_done() to retry the operation after setting the NFS4CLNT_SESSION_SETUP bit to indicate the need to reset the session. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-07nfs41: Handle session errors during delegation returnRicardo Labiaga
Add session error handling to nfs4_open_delegation_recall() Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-07nfs41: Mark stateids in need of reclaim if state manager gets stale clientidRicardo Labiaga
The state manager was not marking the stateids as needing to be reclaimed after reestablishing the clientid. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-07NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configuredTrond Myklebust
Also rename it: it is used in generic code, and so should not have a 'nfs4' prefix. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-07[LogFS] Silence gccJoern Engel
Andrew Morton sayeth: fs/logfs/journal.c: In function 'logfs_init_journal': fs/logfs/journal.c:266: warning: 'last_len' may be used uninitialized in this function Can this be squished please?
2009-12-07Merge commit 'v2.6.32' into reiserfs/kill-bklFrederic Weisbecker
Merge-reason: The tree was based 2.6.31. It's better to be up to date with 2.6.32. Although no conflicting changes were made in between, it gives benchmarking results closer to the lastest kernel behaviour.
2009-12-07[CIFS] Enable mmap on forcedirectio mountsSteve French
openoffice and gedit failed with 'direct' options Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-12-06ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXTAkira Fujita
This patch fixes three problems in the handling of the EXT4_IOC_MOVE_EXT ioctl: 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for original and donor files, but they allow the illegal write access to donor file, since donor file is overwritten by original file data. To fix this problem, change access mode checks of original (r->r/w) and donor (r->w) files. 2. Disallow the use of donor files that have a setuid or setgid bits. 3. Call mnt_want_write() and mnt_drop_write() before and after ext4_move_extents() calling to get write access to a mount. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: Wait for proper transaction commit on fsyncJan Kara
We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: fix incorrect block reservation on quota transfer.Dmitry Monakhov
Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid This means that we may end-up with transferring all quotas. Add we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in case of QUOTA_INIT_BLOCKS. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>