summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-07-28fsnotify: rename fsnotify_groups to fsnotify_inode_groupsEric Paris
Simple renaming patch. fsnotify is about to support mount point listeners so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists used only for groups which have watches on inodes. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: drop mask argument from fsnotify_alloc_groupEric Paris
Nothing uses the mask argument to fsnotify_alloc_group. This patch drops that argument. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: fsnotify_obtain_group should be fsnotify_alloc_groupEric Paris
fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: fsnotify_obtain_group kzalloc cleanupEric Paris
fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0. This patch just deletes those useless lines. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: remove group_num altogetherEric Paris
The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: lock annotation for event replacementEric Paris
fsnotify_replace_event need to lock both the old and the new event. This causes lockdep to get all pissed off since it dosn't know this is safe. It's safe in this case since the new event is impossible to be reached from other places in the kernel. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: replace an event on a listEric Paris
fanotify would like to clone events already on its notification list, make changes to the new event, and then replace the old event on the list with the new event. This patch implements the replace functionality of that process. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: clone existing eventsEric Paris
fsnotify_clone_event will take an event, clone it, and return the cloned event to the caller. Since events may be in use by multiple fsnotify groups simultaneously certain event entries (such as the mask) cannot be changed after the event was created. Since fanotify would like to merge events happening on the same file it needs a new clean event to work with so it can change any fields it wishes. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: per group notification queue merge typesEric Paris
inotify only wishes to merge a new event with the last event on the notification fifo. fanotify is willing to merge any events including by means of bitwise OR masks of multiple events together. This patch moves the inotify event merging logic out of the generic fsnotify notification.c and into the inotify code. This allows each use of fsnotify to provide their own merge functionality. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: send struct file when sending events to parents when possibleEric Paris
fanotify needs a path in order to open an fd to the object which changed. Currently notifications to inode's parents are done using only the inode. For some parental notification we have the entire file, send that so fanotify can use it. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: pass a file instead of an inode to open, read, and writeEric Paris
fanotify, the upcoming notification system actually needs a struct path so it can do opens in the context of listeners, and it needs a file so it can get f_flags from the original process. Close was the only operation that already was passing a struct file to the notification hook. This patch passes a file for access, modify, and open as well as they are easily available to these hooks. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: include data in should_send callsEric Paris
fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: provide the data type to should_send_eventEric Paris
fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28inotify: do not spam console without limitEric Paris
inotify was supposed to have a dmesg printk ratelimitor which would cause inotify to only emit one message per boot. The static bool was never set so it kept firing messages. This patch correctly limits warnings in multiple places. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28inotify: remove inotify in kernel interfaceEric Paris
nothing uses inotify in the kernel, drop it! Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28inotify: do not reuse watch descriptorsEric Paris
Prior to 2.6.31 inotify would not reuse watch descriptors until all of them had been used at least once. After the rewrite inotify would reuse watch descriptors. The selinux utility 'restorecond' was found to have problems when watch descriptors were reused. This patch reverts to the pre inotify rewrite behavior to not reuse watch descriptors. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: use kmem_cache_zalloc to simplify event initializationEric Paris
fsnotify event initialization is done entry by entry with almost everything set to either 0 or NULL. Use kmem_cache_zalloc and only initialize things that need non-zero initialization. Also means we don't have to change initialization entries based on the config options. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: kzalloc fsnotify groupsEric Paris
Use kzalloc for fsnotify_groups so that none of the fields can leak any information accidentally. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28inotify: use container_of instead of castingEric Paris
inotify_free_mark casts directly from an fsnotify_mark_entry to an inotify_inode_mark_entry. This works, but should use container_of instead for future proofing. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: use fsnotify_create_event to allocate the q_overflow eventEric Paris
Currently fsnotify defines a static fsnotify event which is sent when a group overflows its allotted queue length. This patch just allocates that event from the event cache rather than defining it statically. There is no known reason that the current implementation is wrong, but this makes sure the event is initialized and created like any other. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: allow addition of duplicate fsnotify marksEric Paris
This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: duplicate fsnotify_mark_entry data between 2 marksEric Paris
Simple copy fsnotify information from one mark to another in preparation for the second mark to replace the first. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28inotify: simplify the inotify idr handlingEric Paris
This patch moves all of the idr editing operations into their own idr functions. It makes it easier to prove locking correctness and to to understand the code flow. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-27nfsd: bypass readahead cache when have struct fileJ. Bruce Fields
The readahead cache compensates for the fact that the NFS server currently does an open and close on every IO operation in the NFSv2 and NFSv3 case. In the NFSv4 case we have long-lived struct files associated with client opens, so there's no need for this. In fact, concurrent IO's using trying to modify the same file->f_ra may cause problems. So, don't bother with the readahead cache in that case. Note eventually we'll likely do this in the v2/v3 case as well by keeping a cache of struct files instead of struct file_ra_state's. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-27ceph: use complete_all and wake_up_allYehuda Sadeh
This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-279p: Pass the correct end of buffer to p9stat_readLatchesar Ionkov
Pass the correct end of the buffer to p9stat_read. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-07-27ext4: check to make make sure bd_dev is set before dereferencing itTheodore Ts'o
There are some drivers which may not set bdev->bd_dev. So make sure it is non-NULL before dereferencing it. Google-Bug-Id: 1773557 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27jbd2: Make barrier messages less scaryEric Sandeen
Saying things like "sync failed" when a device does not support barriers makes users slightly more worried than they need to be; rather than talking about sync failures, let's just state the barrier-based facts. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: don't print scary messages for allocation failures post-abortEric Sandeen
I often get emails containing the "This should not happen!!" message, conveniently trimmed to remove things like: sd 0:0:0:0: [sda] Unhandled error code sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00 end_request: I/O error, dev sda, sector 51628400 Aborting journal on device dm-0-8. EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal EXT4-fs (dm-0): Remounting filesystem read-only I don't think there is any value to the verbosity if the reason is due to a filesystem abort; it just obfuscates the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: fix EFBIG edge case when writing to large non-extent fileToshiyuki Okajima
By running the following reproducer, we can confirm that the write system call returns with 0 when it should return the error EFBIG. #!/bin/sh /bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1 /sbin/mkfs.ext3 -Fq ./img /bin/mount -o loop -t ext4 ./img /mnt /bin/touch /mnt/file strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = " /bin/umount /mnt exit Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com>
2010-07-27ext4: fix ext4_get_blocks referencesEric Sandeen
ext4_get_blocks got renamed to ext4_map_blocks, but left stale comments and a prototype littered around. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Always journal quota file modificationsJan Kara
When journaled quota options are not specified, we do writes to quota files just in data=ordered mode. This actually causes warnings from JBD2 about dirty journaled buffer because ext4_getblk unconditionally treats a block allocated by it as metadata. Since quota actually is filesystem metadata, the easiest way to get rid of the warning is to always treat quota writes as metadata... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Fix potential memory leak in ext4_fill_superCyrill Gorcunov
Under heavy memory pressure we may hit out of memory situation and as result kstrdup'ed options will not be freed. Fix it. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Don't error out the fs if the user tries to make a file too bigTheodore Ts'o
If the user attempts to make a non-extent-mapped file to be too large, return EFBIG, but don't call ext4_std_err() which will end up marking the file system as containing an error. Thanks to Toshiyuki Okajima-san at Fujitsu for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: allocate stripe-multiple IOs on stripe boundariesEric Sandeen
For some reason, today mballoc only allocates IOs which are exactly stripe-sized on a stripe boundary. If you have a multiple (say, a 128k IO on a 64k stripe) you may end up unaligned. It seems to me that a simple change to align stripe-multiple IOs on stripe boundaries would be a very good idea, unless this breaks some other mballoc heuristic for some reason... Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: move aio completion after unwritten extent conversionjiayingz@google.com (Jiaying Zhang)
This patch is to be applied upon Christoph's "direct-io: move aio_complete into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t, so that we can call aio_complete from ext4_end_io_nolock() after the extent conversion has finished. I have verified with Christoph's aio-dio test that used to fail after a few runs on an original kernel but now succeeds on the patched kernel. See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27direct-io: move aio_complete into ->end_ioChristoph Hellwig
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Support discard requests when running in no-journal modeJiaying Zhang
Issue discard request in ext4_free_blocks() when ext4 has no journal and is mounted with discard option. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27jbd2: Remove __GFP_NOFAIL from jbd2 layerTheodore Ts'o
__GFP_NOFAIL is going away, so add our own retry loop. Also add jbd2__journal_start() and jbd2__journal_restart() which take a gfp mask, so that file systems can optionally (re)start transaction handles using GFP_KERNEL. If they do this, then they need to be prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready to reflect that error up to userspace. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Fix block bitmap inconsistencies after a crash when deleting filesAmir G
We have experienced bitmap inconsistencies after crash during file delete under heavy load. The crash is not file system related and I the following patch in ext4_free_branches() fixes the recovery problem. If the transaction is restarted and there is a crash before the new transaction is committed, then after recovery, the blocks that this indirect block points to have been freed, but the indirect block itself has not been freed and may still point to some of the free blocks (because of the ext4_forget()). So ext4_forget() should be called inside ext4_free_blocks() to avoid this problem. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Remove unnecessary casts of private_dataJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: fix potential NULL dereference while tracingTheodore Ts'o
The allocation_context pointer can be NULL. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Define s_jnl_backup_type in superblockTheodore Ts'o
This has been in use by e2fsprogs for a while; define it to keep the super block fields in sync. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Once a day, printk file system error information to dmesgTheodore Ts'o
This allows us to grab any file system error messages by scraping /var/log/messages. This will make it easy for us to do error analysis across the very large number of machines as we deploy ext4 across the fleet. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Save error information to the superblock for analysisTheodore Ts'o
Save number of file system errors, and the time function name, line number, block number, and inode number of the first and most recent errors reported on the file system in the superblock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Pass line numbers to ext4_error() and friendsTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27ext4: Cleanup ext4_check_dir_entry so __func__ is now implicitTheodore Ts'o
Also start passing the line number to ext4_check_dir since we're going to need it in upcoming patch. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-26xfs simplify and speed up direct I/O completionsChristoph Hellwig
Our current handling of direct I/O completions is rather suboptimal, because we defer it to a workqueue more often than needed, and we perform a much to aggressive flush of the workqueue in case unwritten extent conversions happen. This patch changes the direct I/O reads to not even use a completion handler, as we don't bother to use it at all, and to perform the unwritten extent conversions in caller context for synchronous direct I/O. For a small I/O size direct I/O workload on a consumer grade SSD, such as the untar of a kernel tree inside qemu this patch gives speedups of about 5%. Getting us much closer to the speed of a native block device, or a fully allocated XFS file. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26xfs: move aio completion after unwritten extent conversionChristoph Hellwig
If we write into an unwritten extent using AIO we need to complete the AIO request after the extent conversion has finished. Without that a read could race to see see the extent still unwritten and return zeros. For synchronous I/O we already take care of that by flushing the xfsconvertd workqueue (which might be a bit of overkill). To do that add iocb and result fields to struct xfs_ioend, so that we can call aio_complete from xfs_end_io after the extent conversion has happened. Note that we need a new result field as io_error is used for positive errno values, while the AIO code can return negative error values and positive transfer sizes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26direct-io: move aio_complete into ->end_ioChristoph Hellwig
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Alex Elder <aelder@sgi.com>