summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-07-27ext4: don't print scary messages for allocation failures post-abortEric Sandeen
I often get emails containing the "This should not happen!!" message, conveniently trimmed to remove things like: sd 0:0:0:0: [sda] Unhandled error code sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00 end_request: I/O error, dev sda, sector 51628400 Aborting journal on device dm-0-8. EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal EXT4-fs (dm-0): Remounting filesystem read-only I don't think there is any value to the verbosity if the reason is due to a filesystem abort; it just obfuscates the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: fix EFBIG edge case when writing to large non-extent fileToshiyuki Okajima
By running the following reproducer, we can confirm that the write system call returns with 0 when it should return the error EFBIG. #!/bin/sh /bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1 /sbin/mkfs.ext3 -Fq ./img /bin/mount -o loop -t ext4 ./img /mnt /bin/touch /mnt/file strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = " /bin/umount /mnt exit Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com>
2010-07-27ext4: fix ext4_get_blocks referencesEric Sandeen
ext4_get_blocks got renamed to ext4_map_blocks, but left stale comments and a prototype littered around. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Always journal quota file modificationsJan Kara
When journaled quota options are not specified, we do writes to quota files just in data=ordered mode. This actually causes warnings from JBD2 about dirty journaled buffer because ext4_getblk unconditionally treats a block allocated by it as metadata. Since quota actually is filesystem metadata, the easiest way to get rid of the warning is to always treat quota writes as metadata... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Fix potential memory leak in ext4_fill_superCyrill Gorcunov
Under heavy memory pressure we may hit out of memory situation and as result kstrdup'ed options will not be freed. Fix it. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Don't error out the fs if the user tries to make a file too bigTheodore Ts'o
If the user attempts to make a non-extent-mapped file to be too large, return EFBIG, but don't call ext4_std_err() which will end up marking the file system as containing an error. Thanks to Toshiyuki Okajima-san at Fujitsu for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: allocate stripe-multiple IOs on stripe boundariesEric Sandeen
For some reason, today mballoc only allocates IOs which are exactly stripe-sized on a stripe boundary. If you have a multiple (say, a 128k IO on a 64k stripe) you may end up unaligned. It seems to me that a simple change to align stripe-multiple IOs on stripe boundaries would be a very good idea, unless this breaks some other mballoc heuristic for some reason... Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: move aio completion after unwritten extent conversionjiayingz@google.com (Jiaying Zhang)
This patch is to be applied upon Christoph's "direct-io: move aio_complete into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t, so that we can call aio_complete from ext4_end_io_nolock() after the extent conversion has finished. I have verified with Christoph's aio-dio test that used to fail after a few runs on an original kernel but now succeeds on the patched kernel. See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27direct-io: move aio_complete into ->end_ioChristoph Hellwig
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Support discard requests when running in no-journal modeJiaying Zhang
Issue discard request in ext4_free_blocks() when ext4 has no journal and is mounted with discard option. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27jbd2: Remove __GFP_NOFAIL from jbd2 layerTheodore Ts'o
__GFP_NOFAIL is going away, so add our own retry loop. Also add jbd2__journal_start() and jbd2__journal_restart() which take a gfp mask, so that file systems can optionally (re)start transaction handles using GFP_KERNEL. If they do this, then they need to be prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready to reflect that error up to userspace. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Fix block bitmap inconsistencies after a crash when deleting filesAmir G
We have experienced bitmap inconsistencies after crash during file delete under heavy load. The crash is not file system related and I the following patch in ext4_free_branches() fixes the recovery problem. If the transaction is restarted and there is a crash before the new transaction is committed, then after recovery, the blocks that this indirect block points to have been freed, but the indirect block itself has not been freed and may still point to some of the free blocks (because of the ext4_forget()). So ext4_forget() should be called inside ext4_free_blocks() to avoid this problem. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Remove unnecessary casts of private_dataJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: fix potential NULL dereference while tracingTheodore Ts'o
The allocation_context pointer can be NULL. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Define s_jnl_backup_type in superblockTheodore Ts'o
This has been in use by e2fsprogs for a while; define it to keep the super block fields in sync. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Once a day, printk file system error information to dmesgTheodore Ts'o
This allows us to grab any file system error messages by scraping /var/log/messages. This will make it easy for us to do error analysis across the very large number of machines as we deploy ext4 across the fleet. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Save error information to the superblock for analysisTheodore Ts'o
Save number of file system errors, and the time function name, line number, block number, and inode number of the first and most recent errors reported on the file system in the superblock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Pass line numbers to ext4_error() and friendsTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Cleanup ext4_check_dir_entry so __func__ is now implicitTheodore Ts'o
Also start passing the line number to ext4_check_dir since we're going to need it in upcoming patch. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-26xfs simplify and speed up direct I/O completionsChristoph Hellwig
Our current handling of direct I/O completions is rather suboptimal, because we defer it to a workqueue more often than needed, and we perform a much to aggressive flush of the workqueue in case unwritten extent conversions happen. This patch changes the direct I/O reads to not even use a completion handler, as we don't bother to use it at all, and to perform the unwritten extent conversions in caller context for synchronous direct I/O. For a small I/O size direct I/O workload on a consumer grade SSD, such as the untar of a kernel tree inside qemu this patch gives speedups of about 5%. Getting us much closer to the speed of a native block device, or a fully allocated XFS file. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: move aio completion after unwritten extent conversionChristoph Hellwig
If we write into an unwritten extent using AIO we need to complete the AIO request after the extent conversion has finished. Without that a read could race to see see the extent still unwritten and return zeros. For synchronous I/O we already take care of that by flushing the xfsconvertd workqueue (which might be a bit of overkill). To do that add iocb and result fields to struct xfs_ioend, so that we can call aio_complete from xfs_end_io after the extent conversion has happened. Note that we need a new result field as io_error is used for positive errno values, while the AIO code can return negative error values and positive transfer sizes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26direct-io: move aio_complete into ->end_ioChristoph Hellwig
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: fix big endian buildDave Chinner
Commit 0fd7275cc42ab734eaa1a2c747e65479bd1e42af ("xfs: fix gcc 4.6 set but not read and unused statement warnings") failed to convert some code inside XFS_NATIVE_HOST (big endian host code only) and hence fails to build on such machines. Fix it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26sysfs: allow creating symlinks from untagged to tagged directoriesEric W. Biederman
Supporting symlinks from untagged to tagged directories is reasonable, and needed to support CONFIG_SYSFS_DEPRECATED. So don't fail a prior allowing that case to work. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-26sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories.Eric W. Biederman
This happens for network devices when SYSFS_DEPRECATED is enabled. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-26sysfs: Don't allow the creation of symlinks we can't removeEric W. Biederman
Recently my tagged sysfs support revealed a flaw in the device core that a few rare drivers are running into such that we don't always put network devices in a class subdirectory named net/. Since we are not creating the class directory the network devices wind up in a non-tagged directory, but the symlinks to the network devices from /sys/class/net are in a tagged directory. All of which works until we go to remove or rename the symlink. When we remove or rename a symlink we look in the namespace of the target of the symlink. Since the target of the symlink is in a non-tagged sysfs directory we don't have a namespace to look in, and we fail to remove the symlink. Detect this problem up front and simply don't create symlinks we won't be able to remove later. This prevents symlink leakage and fails in a much clearer and more understandable way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-26xfs: clean up xfs_bmap_get_bpChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: simplify xfs_truncate_fileChristoph Hellwig
xfs_truncate_file is only used for truncating quota files. Move it to xfs_qm_syscalls.c so it can be marked static and take advatange of the fact by removing the unused page cache validation and taking the iget into the helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: kill the b_strat callback in xfs_bufChristoph Hellwig
The b_strat callback is used by xfs_buf_iostrategy to perform additional checks before submitting a buffer. It is used in xfs_bwrite and when writing out delayed buffers. In xfs_bwrite it we can de-virtualize the call easily as b_strat is set a few lines above the call to xfs_buf_iostrategy. For the delayed buffers the rationale is a bit more complicated: - there are three callers of xfs_buf_delwri_queue, which places buffers on the delwri list: (1) xfs_bdwrite - this sets up b_strat, so it's fine (2) xfs_buf_iorequest. None of the callers can have XBF_DELWRI set: - xlog_bdstrat is only used for log buffers, which are never delwri - _xfs_buf_read explicitly clears the delwri flag - xfs_buf_iodone_work retries log buffers only - xfsbdstrat - only used for reads, superblock writes without the delwri flag, log I/O and file zeroing with explicitly allocated buffers. - xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is not set (3) xfs_buf_unlock - only puts the buffer on the delwri list if the DELWRI flag is already set. The DELWRI flag is only ever set in xfs_bwrite, xfs_buf_iodone_callbacks, or xfs_trans_log_buf. For xfs_buf_iodone_callbacks and xfs_trans_log_buf we require an initialized buf item, which means b_strat was set to xfs_bdstrat_cb in xfs_buf_item_init. Conclusion: we can just get rid of the callback and replace it with explicit calls to xfs_bdstrat_cb. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: remove obsolete osyncisosync mount optionChristoph Hellwig
Since Linux 2.6.33 the kernel has support for real O_SYNC, which made the osyncisosync option a no-op. Warn the users about this and remove the mount flag for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: clean up filestreams helpersChristoph Hellwig
Move xfs_filestream_peek_ag, xxfs_filestream_get_ag and xfs_filestream_put_ag from xfs_filestream.h to xfs_filestream.c where it's only callers are, and remove the inline marker while we're at it to let the compiler decide on the inlining. Also don't return a value from xfs_filestream_put_ag because we don't need it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: fix gcc 4.6 set but not read and unused statement warningsChristoph Hellwig
[hch: dropped a few hunks that need structural changes instead] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: Fix build when CONFIG_XFS_POSIX_ACL=nTony Luck
When CONFIG_XFS_POSIX_ACL is not set "xfs_check_acl" is #defined to NULL - which breaks the code attempting to add a tracepoint on this function. Only define the tracepoint when the function exists. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: fix unsigned underflow in xfs_free_eofblocksKulikov Vasiliy
map_len is unsigned. Checking map_len <= 0 is buggy when it should be below zero. So, check exact expression instead of map_len. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: use GFP_NOFS for page cache allocationDave Chinner
Avoid a lockdep warning by preventing page cache allocation from recursing back into the filesystem during memory reclaim. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: fix memory reclaim recursion deadlock on locked inode bufferDave Chinner
Calling into memory reclaim with a locked inode buffer can deadlock if memory reclaim tries to lock the inode buffer during inode teardown. Convert the relevant memory allocations to use KM_NOFS to avoid this deadlock condition. Reported-by: Peter Watkins <treestem@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: fix xfs_trans_add_item() lockdep warningsDave Chinner
xfs_trans_add_item() is called with ip->i_ilock held, which means it is unsafe for memory reclaim to recurse back into the filesystem (ilock is required in writeback). Hence the allocation needs to be KM_NOFS to avoid recursion. Lockdep report indicating memory allocation being called with the ip->i_ilock held is as follows: [ 1749.866796] ================================= [ 1749.867788] [ INFO: inconsistent lock state ] [ 1749.868327] 2.6.35-rc3-dgc+ #25 [ 1749.868741] --------------------------------- [ 1749.868741] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. [ 1749.868741] dd/2835 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 1749.868741] (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190 [ 1749.868741] {IN-RECLAIM_FS-W} state was registered at: [ 1749.868741] [<ffffffff810b3a97>] __lock_acquire+0x437/0x1450 [ 1749.868741] [<ffffffff810b4b56>] lock_acquire+0xa6/0x160 [ 1749.868741] [<ffffffff810a20b5>] down_write_nested+0x65/0xb0 [ 1749.868741] [<ffffffff813170fb>] xfs_ilock+0x10b/0x190 [ 1749.868741] [<ffffffff8134e819>] xfs_reclaim_inode+0x99/0x310 [ 1749.868741] [<ffffffff8134f56b>] xfs_inode_ag_walk+0x8b/0x150 [ 1749.868741] [<ffffffff8134f6bb>] xfs_inode_ag_iterator+0x8b/0xf0 [ 1749.868741] [<ffffffff8134f7a8>] xfs_reclaim_inode_shrink+0x88/0x90 [ 1749.868741] [<ffffffff81119d07>] shrink_slab+0x137/0x1a0 [ 1749.868741] [<ffffffff8111bbe1>] balance_pgdat+0x421/0x6a0 [ 1749.868741] [<ffffffff8111bf7d>] kswapd+0x11d/0x320 [ 1749.868741] [<ffffffff8109ce56>] kthread+0x96/0xa0 [ 1749.868741] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10 [ 1749.868741] irq event stamp: 4234335 [ 1749.868741] hardirqs last enabled at (4234335): [<ffffffff81147d25>] kmem_cache_free+0x115/0x220 [ 1749.868741] hardirqs last disabled at (4234334): [<ffffffff81147c4d>] kmem_cache_free+0x3d/0x220 [ 1749.868741] softirqs last enabled at (4233112): [<ffffffff81084dd2>] __do_softirq+0x142/0x260 [ 1749.868741] softirqs last disabled at (4233095): [<ffffffff81035edc>] call_softirq+0x1c/0x50 [ 1749.868741] [ 1749.868741] other info that might help us debug this: [ 1749.868741] 2 locks held by dd/2835: [ 1749.868741] #0: (&(&ip->i_iolock)->mr_lock#2){+.+.+.}, at: [<ffffffff81316edd>] xfs_ilock_nowait+0xed/0x200 [ 1749.868741] #1: (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190 [ 1749.868741] [ 1749.868741] stack backtrace: [ 1749.868741] Pid: 2835, comm: dd Not tainted 2.6.35-rc3-dgc+ #25 [ 1749.868741] Call Trace: [ 1749.868741] [<ffffffff810b1faa>] print_usage_bug+0x18a/0x190 [ 1749.868741] [<ffffffff8104264f>] ? save_stack_trace+0x2f/0x50 [ 1749.868741] [<ffffffff810b2400>] ? check_usage_backwards+0x0/0xf0 [ 1749.868741] [<ffffffff810b2f11>] mark_lock+0x331/0x400 [ 1749.868741] [<ffffffff810b3047>] mark_held_locks+0x67/0x90 [ 1749.868741] [<ffffffff810b3111>] lockdep_trace_alloc+0xa1/0xe0 [ 1749.868741] [<ffffffff81147419>] kmem_cache_alloc+0x39/0x1e0 [ 1749.868741] [<ffffffff8133f954>] kmem_zone_alloc+0x94/0xe0 [ 1749.868741] [<ffffffff8133f9be>] kmem_zone_zalloc+0x1e/0x50 [ 1749.868741] [<ffffffff81335f02>] xfs_trans_add_item+0x72/0xb0 [ 1749.868741] [<ffffffff81339e41>] xfs_trans_ijoin+0xa1/0xd0 [ 1749.868741] [<ffffffff81319f82>] xfs_itruncate_finish+0x312/0x5d0 [ 1749.868741] [<ffffffff8133cb87>] xfs_free_eofblocks+0x227/0x280 [ 1749.868741] [<ffffffff8133cd18>] xfs_release+0x138/0x190 [ 1749.868741] [<ffffffff813464c5>] xfs_file_release+0x15/0x20 [ 1749.868741] [<ffffffff81150ebf>] fput+0x13f/0x260 [ 1749.868741] [<ffffffff8114d8c2>] filp_close+0x52/0x80 [ 1749.868741] [<ffffffff8114d9a9>] sys_close+0xb9/0x120 [ 1749.868741] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: simplify and remove xfs_ireclaimDave Chinner
xfs_ireclaim has to get and put te pag structure because it is only called with the inode to reclaim. The one caller of this function already has a reference on the pag and a pointer to is, so move the radix tree delete to the caller and remove xfs_ireclaim completely. This avoids a xfs_perag_get/put on every inode being reclaimed. The overhead was noticed in a bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=16348 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: don't block on buffer read errorsDave Chinner
xfs_buf_read() fails to detect dispatch errors before attempting to wait on sychronous IO. If there was an error, it will get stuck forever, waiting for an I/O that was never started. Make sure the error is detected correctly. Further, such a failure can leave locked pages in the page cache which will cause a later operation to hang on the page. Ensure that we correctly process pages in the buffers when we get a dispatch error. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: move inode shrinker unregister even earlierDave Chinner
I missed Dave Chinner's second revision of this change, and pushed his first version out to the repository instead. commit a476c59ebb279d738718edc0e3fb76aab3687114 Author: Dave Chinner <dchinner@redhat.com> This commit compensates for that by moving a block of code up a bit further, with a result that matches the the effect of Dave's second version. Dave's first version was: Reviewed-by: Eric Sandeen <sandeen@redhat.com> Dave's second version was: Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2010-07-26xfs: remove a dmapi leftoverChristoph Hellwig
The open_exec file operation is only added by the external dmapi patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: writepage always has buffersChristoph Hellwig
These days we always have buffers thanks to ->page_mkwrite. And we already have an assert a few lines above tripping in case that was not true due to a bug. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: allow writeback from kswapdChristoph Hellwig
We only need disable I/O from direct or memcg reclaim. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: remove incorrect log write optimizationChristoph Hellwig
We do need a barrier for the first buffer of a split log write. Otherwise we might incorrectly stamp the tail LSN into transactions in the first part of the split write, or not flush data I/O before updating the inode size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: unregister inode shrinker before freeing filesystem structuresDave Chinner
Currently we don't remove the XFS mount from the shrinker list until late in the unmount path. By this time, we have already torn down the internals of the filesystem (e.g. the per-ag structures), and hence if the shrinker is executed between the teardown and the unregistering, the shrinker will get NULL per-ag structure pointers and panic trying to dereference them. Fix this by removing the xfs mount from the shrinker list before tearing down it's internal structures. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: split xfs_itrace_entryChristoph Hellwig
Replace the xfs_itrace_entry catchall with specific trace points. For most simple callers we now use the simple inode class, which used to be the iget class, but add more details tracing for namespace events, which now includes the name of the directory entries manipulated. Remove the xfs_inactive trace point, which is a duplicate of the clear_inode one, and the xfs_change_file_space trace point, which is immediately followed by the more specific alloc/free space trace points. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: remove xfs_iputChristoph Hellwig
xfs_iput is just a small wrapper for xfs_iunlock + IRELE. Having this out of line wrapper means the trace events in those two can't track their caller properly. So just remove the wrapper and opencode the unlock + rele in the few callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: remove xfs_iput_newChristoph Hellwig
We never get an i_mode of 0 or a locked VFS inode until we pass in the XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to xfs_iput for the only caller. In addition to that xfs_nfs_get_inode does not even need to lock the inode given that the generation never changes for a life inode, so just pass a 0 lock_flags to xfs_iget and release the inode using IRELE in the error path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: some iget tracing cleanups / fixesChristoph Hellwig
The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced. Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining of the xfs_iget_cache_hit/miss functions. Add a new xfs_iget_reclaim_fail tracepoint for the case where we fail to re-initialize a VFS inode, and add a second instance of the xfs_iget_skip tracepoint for the case of a failed igrab() call. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: do not use emums for flags used in tracingChristoph Hellwig
The tracing code can't print flags defined as enums. Most flags that we want to print are defines as macros already, but move the few remaining ones over to make the trace output more useful. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>