summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-10-07f2fs: handle remount options correctlyKelly Anderson
The current f2fs code errors if the xattr or acl options are passed when remounting. This is important in a typical scenario where f2fs is mounted as a "ro" root file-system by the boot loader and then the init process wants to remount it "rw" with the "remount,rw" option. Signed-off-by: Kelly Anderson <kelly@xilka.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-07f2fs: use rw_sem instead of fs_lock(locks mutex)Gu Zheng
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint. And each other operate routine(besides checkpoint) needs to acquire a fs_lock, there is a terrible problem here, if these are too many concurrency threads acquiring fs_lock, so that they will block each other and may lead to some performance problem, but this is not the phenomenon we want to see. Though there are some optimization patches introduced to enhance the usage of fs_lock, but the thorough solution is using a *rw_sem* to replace the fs_lock. Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other, this can avoid the problem described above completely. Because of the weakness of rw_sem, the above change may introduce a potential problem that the checkpoint thread might get starved if other threads are intensively locking the read semaphore for I/O.(Pointed out by Xu Jin) In order to avoid this, a wait_list is introduced, the appending read semaphore ops will be dropped into the wait_list if checkpoint thread is waiting for write semaphore, and will be waked up when checkpoint thread gives up write semaphore. Thanks to Kim's previous review and test, and will be very glad to see other guys' performance tests about this patch. V2: -fix the potential starvation problem. -use more suitable func name suggested by Xu Jin. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: adjust minor coding standard] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-06cifs: Avoid umount hangs with smb2 when server is unresponsiveShirish Pargaonkar
Do not send SMB2 Logoff command when reconnecting, the way smb1 code base works. Also, no need to wait for a credit for an echo command when one is already in flight. Without these changes, umount command hangs if the server is unresponsive e.g. hibernating. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@us.ibm.com>
2013-10-05do not treat non-symlink reparse points as valid symlinksSteve French
Windows 8 and later can create NFS symlinks (within reparse points) which we were assuming were normal NTFS symlinks and thus reporting corrupt paths for. Add check for reparse points to make sure that they really are normal symlinks before we try to parse the pathname. We also should not be parsing other types of reparse points (DFS junctions etc) as if they were a symlink so return EOPNOTSUPP on those. Also fix endian errors (we were not parsing symlink lengths as little endian). This fixes commit d244bf2dfbebfded05f494ffd53659fa7b1e32c1 which implemented follow link for non-Unix CIFS mounts CC: Stable <stable@kernel.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-05sysfs: merge regular and bin file handlingTejun Heo
With the previous changes, sysfs regular file code is ready to handle bin files too. This patch makes bin files share the regular file path. * sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c. * sysfs_init_inode() is updated to use the new sysfs_bin_operations instead of bin_fops for bin files. * fs/sysfs/bin.c and the related pieces are removed. This patch shouldn't introduce any behavior difference to bin file accesses. Overall, this unification reduces the amount of duplicate logic, makes behaviors more consistent and paves the road for building simpler and more versatile interface which will allow other subsystems to make use of sysfs for their pseudo filesystems. v2: Stale fs/sysfs/bin.c reference dropped from Documentation/DocBook/filesystems.tmpl. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay@vrfy.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: prepare open path for unified regular / bin file handlingTejun Heo
sysfs bin file handling will be merged into the regular file support. This patch prepares the open path. This patch updates sysfs_open_file() such that it can handle both regular and bin files. This is a preparation and the new bin file path isn't used yet. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.cTejun Heo
sysfs bin file handling will be merged into the regular file support. This patch copies mmap support from bin so that fs/sysfs/file.c can handle mmapping bin files. The code is copied mostly verbatim with the following updates. * ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer references are replaced with sysfs_open_file ones. * Symbols are prefixed with sysfs_. * sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses ->files. Invocation of this function is added to sysfs_addrm_finish(). * sysfs_bin_mmap() is added to sysfs_bin_operations. This is a preparation and the new mmap path isn't used yet. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: add sysfs_bin_read()Tejun Heo
sysfs bin file handling will be merged into the regular file support. This patch prepares the read path. Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use sysfs_open_file instead of bin_buffer. The function is identical copy except for the use of sysfs_open_file. The new function is added to sysfs_bin_operations. This isn't used yet but will eventually replace fs/sysfs/bin.c. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: prepare path write for unified regular / bin file handlingTejun Heo
sysfs bin file handling will be merged into the regular file support. This patch prepares the write path. bin file write is almost identical to regular file write except that the write length is capped by the inode size and @off is passed to the write method. This patch adds bin file handling to sysfs_write_file() so that it can handle both regular and bin files. A new file_operations struct sysfs_bin_operations is added, which currently only hosts sysfs_write_file() and generic_file_llseek(). This isn't used yet but will eventually replace fs/sysfs/bin.c. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: collapse fs/sysfs/bin.c::fill_read() into read()Tejun Heo
read() is simple enough and fill_read() being in a separate function doesn't add anything. Let's collapse it into read(). This will make merging bin file handling with regular file. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: skip bin_buffer->buffer while readingTejun Heo
After b31ca3f5dfc ("sysfs: fix deadlock"), bin read() first writes data to bb->buffer and bounces it to a transient kernel buffer which is then copied out to userland. The double bouncing doesn't add anything. Let's just use the transient buffer directly. While at it, rename @temp to @buf for clarity. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: use seq_file when reading regular filesTejun Heo
sysfs read path implements its own buffering scheme between userland and kernel callbacks, which essentially is a degenerate duplicate of seq_file. This patch replaces the custom read buffering implementation in sysfs with seq_file. While the amount of code reduction is small, this reduces low level hairiness and enables future development of a new versatile API based on seq_file so that sysfs features can be shared with other subsystems. As write path was already converted to not use sysfs_open_file->page, this patch makes ->page and ->count unused and removes them. Userland behavior remains the same except for some extreme corner cases - e.g. sysfs will now regenerate the content each time a file is read after a non-contiguous seek whereas the original code would keep using the same content. While this is a userland visible behavior change, it is extremely unlikely to be noticeable and brings sysfs behavior closer to that of procfs. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: use transient write bufferTejun Heo
There isn't much to be gained by keeping around kernel buffer while a file is open especially as the read path planned to be converted to use seq_file and won't use the buffer. This patch makes sysfs_write_file() use per-write transient buffer instead of sysfs_open_file->page. This simplifies the write path, enables removing sysfs_open_file->page once read path is updated and will help merging bin file write path which already requires the use of a transient buffer due to a locking order issue. As the function comments of flush_write_buffer() and sysfs_write_buffer() are being updated anyway, reformat them so that they're more conventional. v2: Use min_t() instead of min() in sysfs_write_file() to avoid build warning on arm. Reported by build test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: add sysfs_open_file->sd and ->fileTejun Heo
sysfs will be converted to use seq_file for read path, which will make it difficult to pass around multiple pointers directly. This patch adds sysfs_open_file->sd and ->file so that we can reach all the necessary data structures from sysfs_open_file. flush_write_buffer() is updated to drop @dentry which was used to discover the sysfs_dirent as it's now available through sysfs_open_file->sd. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: rename sysfs_buffer to sysfs_open_fileTejun Heo
sysfs read path will be converted to use seq_file which will handle buffering making sysfs_buffer a misnomer. Rename sysfs_buffer to sysfs_open_file, and sysfs_open_dirent->buffers to ->files. This path is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: add sysfs_open_file_mutexTejun Heo
Add a separate mutex to protect sysfs_open_dirent->buffers list. This will allow performing sleepable operations while traversing sysfs_buffers, which will be renamed to sysfs_open_file. Note that currently sysfs_open_dirent->buffers list isn't being used for anything and this patch doesn't make any functional difference. It will be used to merge regular and bin file supports. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: remove sysfs_buffer->opsTejun Heo
Currently, sysfs_ops is fetched during sysfs_open_file() and cached in sysfs_buffer->ops to be used while the file is open. This patch removes the caching and makes each operation directly fetch sysfs_ops. This patch doesn't introduce any behavior difference and is to prepare for merging regular and bin file supports. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is a small collection of fixes, including a regression fix from Liu Bo that solves rare crashes with compression on. I've merged my for-linus up to 3.12-rc3 because the top commit is only meant for 3.12. The rest of the fixes are also available in my master branch on top of my last 3.11 based pull" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: Fix crash due to not allocating integrity data for a bioset Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing Btrfs: eliminate races in worker stopping code Btrfs: fix crash of compressed writes Btrfs: fix transid verify errors when recovering log tree
2013-10-05sysfs: remove sysfs_buffer->needs_read_fillTejun Heo
->needs_read_fill is used to implement the following behaviors. 1. Ensure buffer filling on the first read. 2. Force buffer filling after a write. 3. Force buffer filling after a successful poll. However, #2 and #3 don't really work as sysfs doesn't reset file position. While the read buffer would be refilled, the next read would continue from the position after the last read or write, requiring an explicit seek to the start for it to be useful, which makes ->needs_read_fill superflous as read buffer is always refilled if f_pos == 0. Update sysfs_read_file() to test buffer->page for #1 instead and remove ->needs_read_fill. While this changes behavior in extreme corner cases - e.g. re-reading a sysfs file after seeking to non-zero position after a write or poll, it's highly unlikely to lead to actual breakage. This change is to prepare for using seq_file in the read path. While at it, reformat a comment in fill_write_buffer(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05sysfs: remove unused sysfs_buffer->posTejun Heo
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05btrfs: Fix crash due to not allocating integrity data for a biosetDarrick J. Wong
When btrfs creates a bioset, we must also allocate the integrity data pool. Otherwise btrfs will crash when it tries to submit a bio to a checksumming disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 PGD 2305e4067 PUD 23063d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug] CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000 RIP: 0010:[<ffffffff8111e28a>] [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP: 0018:ffff880230699688 EFLAGS: 00010286 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445 RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000 RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008 R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210 R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720 FS: 00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0 Stack: ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250 ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000 0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900 Call Trace: [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0 [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360 [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150 [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs] [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460 [<ffffffff8123e58a>] generic_make_request+0xca/0x100 [<ffffffff8123e639>] submit_bio+0x79/0x160 [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs] [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs] [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs] [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs] [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0 [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260 [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120 [btrfs] [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs] [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs] [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs] [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0 [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0 [<ffffffff81176ba0>] mount_fs+0x20/0xd0 [<ffffffff81191096>] vfs_kern_mount+0x76/0x120 [<ffffffff81193320>] do_mount+0x200/0xa40 [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80 [<ffffffff81193bf0>] SyS_mount+0x90/0xe0 [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48 89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7 44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74 RIP [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP <ffff880230699688> CR2: 0000000000000018 ---[ end trace 7a96042017ed21e2 ]--- Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-05Merge branch 'for-linus' into for-linus-3.12Chris Mason
2013-10-04Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Small set of cifs fixes. Most important is Jeff's fix that works around disconnection problems which can be caused by simultaneous use of user space tools (starting a long running smbclient backup then doing a cifs kernel mount) or multiple cifs mounts through a NAT, and Jim's fix to deal with reexport of cifs share. I expect to send two more cifs fixes next week (being tested now) - fixes to address an SMB2 unmount hang when server dies and a fix for cifs symlink handling of Windows "NFS" symlinks" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] update cifs.ko version [CIFS] Remove ext2 flags that have been moved to fs.h [CIFS] Provide sane values for nlink cifs: stop trying to use virtual circuits CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them
2013-10-04Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs bugfixes from Ben Myers: "There are lockdep annotations for project quotas, a fix for dirent dtype support on v4 filesystems, a fix for a memory leak in recovery, and a fix for the build error that resulted from it. D'oh" * tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs: xfs: Use kmem_free() instead of free() xfs: fix memory leak in xlog_recover_add_to_trans xfs: dirent dtype presence is dependent on directory magic numbers xfs: lockdep needs to know about 3 dquot-deep nesting
2013-10-04Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishingIlya Dryomov
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev, can be processed before btrfs_scratch_superblock is called, which would result in a use-after-free on btrfs_device contents. Fix this by zeroing the superblock before the rcu callback is registered. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04Btrfs: eliminate races in worker stopping codeIlya Dryomov
The current implementation of worker threads in Btrfs has races in worker stopping code, which cause all kinds of panics and lockups when running btrfs/011 xfstest in a loop. The problem is that btrfs_stop_workers is unsynchronized with respect to check_idle_worker, check_busy_worker and __btrfs_start_workers. E.g., check_idle_worker race flow: btrfs_stop_workers(): check_idle_worker(aworker): - grabs the lock - splices the idle list into the working list - removes the first worker from the working list - releases the lock to wait for its kthread's completion - grabs the lock - if aworker is on the working list, moves aworker from the working list to the idle list - releases the lock - grabs the lock - puts the worker - removes the second worker from the working list ...... btrfs_stop_workers returns, aworker is on the idle list FS is umounted, memory is freed ...... aworker is waken up, fireworks ensue With this applied, I wasn't able to trigger the problem in 48 hours, whereas previously I could reliably reproduce at least one of these races within an hour. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04Btrfs: fix crash of compressed writesLiu Bo
The crash[1] is found by xfstests/generic/208 with "-o compress", it's not reproduced everytime, but it does panic. The bug is quite interesting, it's actually introduced by a recent commit (573aecafca1cf7a974231b759197a1aebcf39c2a, Btrfs: actually limit the size of delalloc range). Btrfs implements delay allocation, so during writeback, we (1) get a page A and lock it (2) search the state tree for delalloc bytes and lock all pages within the range (3) process the delalloc range, including find disk space and create ordered extent and so on. (4) submit the page A. It runs well in normal cases, but if we're in a racy case, eg. buffered compressed writes and aio-dio writes, sometimes we may fail to lock all pages in the 'delalloc' range, in which case, we need to fall back to search the state tree again with a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset). The mentioned commit has a side effect, that is, in the fallback case, we can find delalloc bytes before the index of the page we already have locked, so we're in the case of (delalloc_end <= *start) and return with (found > 0). This ends with not locking delalloc pages but making ->writepage still process them, and the crash happens. This fixes it by just thinking that we find nothing and returning to caller as the caller knows how to deal with it properly. [1]: ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2170! [...] CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G O 3.11.0+ #8 [...] RIP: 0010:[<ffffffff810f5093>] [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83 [...] [ 4934.248731] Stack: [ 4934.248731] ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a [ 4934.248731] 0000000000000000 0000000000000000 0000000000000fff 0000000000000620 [ 4934.248731] ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0 [ 4934.248731] Call Trace: [ 4934.248731] [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs] [ 4934.248731] [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs] [ 4934.248731] [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b [ 4934.248731] [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs] [ 4934.248731] [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs] [ 4934.248731] [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [ 4934.248731] [<ffffffff810608f5>] kthread+0x8d/0x95 [ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43 [ 4934.248731] [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0 [ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43 [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44 [ 4934.248731] RIP [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83 [ 4934.248731] RSP <ffff8801869f9c48> [ 4934.280307] ---[ end trace 36f06d3f8750236a ]--- Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04Btrfs: fix transid verify errors when recovering log treeJosef Bacik
If we crash with a log, remount and recover that log, and then crash before we can commit another transaction we will get transid verify errors on the next mount. This is because we were not zero'ing out the log when we committed the transaction after recovery. This is ok as long as we commit another transaction at some point in the future, but if you abort or something else goes wrong you can end up in this weird state because the recovery stuff says that the tree log should have a generation+1 of the super generation, which won't be the case of the transaction that was started for recovery. Fix this by removing the check and _always_ zero out the log portion of the super when we commit a transaction. This fixes the transid verify issues I was seeing with my force errors tests. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04xfs: Use kmem_free() instead of free()Thierry Reding
This fixes a build failure caused by calling the free() function which does not exist in the Linux kernel. Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit aaaae98022efa4f3c31042f1fdf9e7a0c5f04663)
2013-10-04xfs: fix memory leak in xlog_recover_add_to_transtinguely@sgi.com
Free the memory in error path of xlog_recover_add_to_trans(). Normally this memory is freed in recovery pass2, but is leaked in the error path. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 519ccb81ac1c8e3e4eed294acf93be00b43dcad6)
2013-10-04xfs: dirent dtype presence is dependent on directory magic numbersDave Chinner
The determination of whether a directory entry contains a dtype field originally was dependent on the filesystem having CRCs enabled. This meant that the format for dtype beign enabled could be determined by checking the directory block magic number rather than doing a feature bit check. This was useful in that it meant that we didn't need to pass a struct xfs_mount around to functions that were already supplied with a directory block header. Unfortunately, the introduction of dtype fields into the v4 structure via a feature bit meant this "use the directory block magic number" method of discriminating the dirent entry sizes is broken. Hence we need to convert the places that use magic number checks to use feature bit checks so that they work correctly and not by chance. The current code works on v4 filesystems only because the dirent size roundup covers the extra byte needed by the dtype field in the places where this problem occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 367993e7c6428cb7617ab7653d61dca54e2fdede)
2013-10-04xfs: lockdep needs to know about 3 dquot-deep nestingDave Chinner
Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit f112a049712a5c07de25d511c3c6587a2b1a015e)
2013-10-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse bugfixes from Miklos Szeredi: "This contains two more fixes by Maxim for writeback/truncate races and fixes for RCU walk in fuse_dentry_revalidate()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: no RCU mode in fuse_access() fuse: readdirplus: fix RCU walk fuse: don't check_submounts_and_drop() in RCU walk fuse: fix fallocate vs. ftruncate race fuse: wait for writeback in fuse_file_fallocate()
2013-10-04GFS2: Protect quota sync generationSteven Whitehouse
Now that gfs2_quota_sync can be potentially called from multiple threads, we should protect this bit of code, and the sync generation number in particular in order to ensure that there are no races when syncing quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Inline qd_trylock into gfs2_quota_unlockSteven Whitehouse
The function qd_trylock was not a trylock despite its name and can be inlined into gfs2_quota_unlock in order to make the code a bit clearer. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Make two similar quota code fragments into a functionSteven Whitehouse
There should be no functional change bar the removal of a test of the MS_READONLY flag which would never be reachable. This merges the common code from qd_fish and qd_trylock into a single function and calls it from both those places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Remove obsolete quota tunableSteven Whitehouse
There is no need for a paramater which relates to the internals of quota to be exposed to users. The only possible use would be to turn it up so large that the memory allocation fails. So lets remove it and set it to a sensible value which ensures that we don't ask for multipage allocations. Currently the size of struct gfs2_holder means that the caluclated value is identical to the previous default value, so there should be no functional change. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-03sysfs: introduce [__]sysfs_remove()Tejun Heo
Given a sysfs_dirent, there is no reason to have multiple versions of removal functions. A function which removes the specified sysfs_dirent and its descendants is enough. This patch intorduces [__}sysfs_remove() which replaces all internal variations of removal functions. This will be the only removal function in the planned new sysfs_dirent based interface. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03sysfs: make __sysfs_remove_dir() recursiveTejun Heo
Currently, sysfs directory removal is inconsistent in that it would remove any files directly under it but wouldn't recurse into directories. Thanks to group subdirectories, this doesn't even match with kobject boundaries. sysfs is in the process of being separated out so that it can be used by multiple subsystems and we want to have a consistent behavior - either removal of a sysfs_dirent should remove every descendant entries or none instead of something inbetween. This patch implements proper recursive removal in __sysfs_remove_dir(). The function now walks its subtree in a post-order walk to remove all descendants. This is a behavior change but kobject / driver layer, which currently is the only consumer, has already been updated to handle duplicate removal attempts, so nothing should be broken after this change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03kobject: grab an extra reference on kobject->sd to allow duplicate deletesTejun Heo
sysfs currently has a rather weird behavior regarding removals. A directory removal would delete all files directly under it but wouldn't recurse into subdirectories, which, while a bit inconsistent, seems to make sense at the first glance as each directory is supposedly associated with a kobject and each kobject can take care of the directory deletion; however, this doesn't really hold as we have groups which can be directories without a kobject associated with it and require explicit deletions. We're in the process of separating out sysfs from kboject / driver core and want a consistent behavior. A removal should delete either only the specified node or everything under it. I think it is helpful to support recursive atomic removal and later patches will implement it. Such change means that a sysfs_dirent associated with kobject may be deleted before the kobject itself is removed if one of its ancestor gets removed before it. As sysfs_remove_dir() puts the base ref, we may end up with dangling pointer on descendants. This can be solved by holding an extra reference on the sd from kobject. Acquire an extra reference on the associated sysfs_dirent on directory creation and put it after removal. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03sysfs: remove sysfs_addrm_cxt->parent_sdTejun Heo
sysfs_addrm_start/finish() enclose sysfs_dirent additions and deletions and sysfs_addrm_cxt is used to record information necessary to finish the operations. Currently, sysfs_addrm_start() takes @parent_sd, records it in sysfs_addrm_cxt, and assumes that all operations in the block are performed under that @parent_sd. This assumption has been fine until now but we want to make some operations behave recursively and, while having @parent_sd recorded in sysfs_addrm_cxt doesn't necessarily prevents that, it becomes confusing. This patch removes sysfs_addrm_cxt->parent_sd and makes sysfs_add_one() take an explicit @parent_sd parameter. Note that sysfs_remove_one() doesn't need the extra argument as its parent is always known from the target @sd. While at it, add __acquires/releases() notations to sysfs_addrm_start/finish() respectively. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-02nfsd: switch to %p[dD]Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-02Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio use-after-free fix from Ben LaHaise. * git://git.kvack.org/~bcrl/aio-next: aio: fix use-after-free in aio_migratepage
2013-10-02GFS2: Move gfs2_icbit_munge into quota.cSteven Whitehouse
This function is only called twice, and both callers are quota related, so lets move this function into quota.c and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-02GFS2: Speed up starting point selection for block allocationSteven Whitehouse
When setting the starting point for block allocation, there were calls to both gfs2_rbm_to_block() and gfs2_rbm_from_block() in the common case of there being an active reservation. The gfs2_rbm_from_block() function can be quite slow, and since the two conversions were effectively a no-op, it makes sense to avoid them entirely in this case. There is no functional change here, but the code should be a bit more efficient after this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-02GFS2: Add allocation parameters structureSteven Whitehouse
This patch adds a structure to contain allocation parameters with the intention of future expansion of this structure. The idea is that we should be able to add more information about the allocation in the future in order to allow the allocator to make a better job of placing the requests on-disk. There is no functional difference from applying this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-01xfs: remove usage of is_bad_inodeBen Myers
XFS never calls mark_inode_bad or iget_failed, so it will never see a bad inode. Remove all checks for is_bad_inode because they are unnecessary. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2013-10-01xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct()Jie Liu
At xfs_iext_realloc_direct(), the new_size is changed by adding if_bytes if originally the extent records are stored at the inline extent buffer, and we have to switch from it to a direct extent list for those new allocated extents, this is wrong. e.g, Create a file with three extents which was showing as following, xfs_io -f -c "truncate 100m" /xfs/testme for i in $(seq 0 5 10); do offset=$(($i * $((1 << 20)))) xfs_io -c "pwrite $offset 1m" /xfs/testme done Inline ------ irec: if_bytes bytes_diff new_size 1st 0 16 16 2nd 16 16 32 Switching --------- rnew_size 3rd 32 16 48 + 32 = 80 roundup=128 In this case, the desired value of new_size should be 48, and then it will be roundup to 64 and be assigned to rnew_size. However, this issue has been covered by resetting the if_bytes to the new_size which is calculated at the begnning of xfs_iext_add() before leaving out this function, and in turn make the rnew_size correctly again. Hence, this can not be detected via xfstestes. This patch fix above problem and revise the new_size comments at xfs_iext_realloc_direct() to make it more readable. Also, fix the comments while switching from the inline extent buffer to a direct extent list to reflect this change. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01NFSv4: Ensure that we disable the resend timeout for NFSv4Trond Myklebust
The spec states that the client should not resend requests because the server will disconnect if it needs to drop an RPC request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()Trond Myklebust
In nfs4_proc_getlk(), when some error causes a retry of the call to _nfs4_proc_getlk(), we can end up with Oopses of the form BUG: unable to handle kernel NULL pointer dereference at 0000000000000134 IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30 <snip> Call Trace: [<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70 [<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4] [<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4] [<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4] [<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4] The problem is that we don't clear the request->fl_ops after the first try and so when we retry, nfs4_set_lock_state() exits early without setting the lock stateid. Regression introduced by commit 70cc6487a4e08b8698c0e2ec935fb48d10490162 (locks: make ->lock release private data before returning in GETLK case) Reported-by: Weston Andros Adamson <dros@netapp.com> Reported-by: Jorge Mora <mora@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: <stable@vger.kernel.org> #2.6.22+