summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-10-17Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Five small cifs fixes (includes fixes for: unmount hang, 2 security related, symlink, large file writes)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: cifs: ntstatus_to_dos_map[] is not terminated cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods cifs: Fix inability to write files >2GB to SMB2/3 shares cifs: Avoid umount hangs with smb2 when server is unresponsive do not treat non-symlink reparse points as valid symlinks
2013-10-17ext4: add ratelimiting to ext4 messagesTheodore Ts'o
In the case of a storage device that suddenly disappears, or in the case of significant file system corruption, this can result in a huge flood of messages being sent to the console. This can overflow the file system containing /var/log/messages, or if a serial console is configured, this can slow down the system so much that a hardware watchdog can end up triggering forcing a system reboot. Google-Bug-Id: 7258357 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-18f2fs: avoid to write during the recoveryJaegeuk Kim
This patch enhances the recovery routine not to write any data/node/meta until its completion. If any writes are sent to the disk, it could contaminate the written history that will be used for further recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18f2fs: avoid wait if IO end up when do_checkpoint for better performanceGu Zheng
Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back. Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and no additional wake up mechanism was introduced if IO ends up before regular period costed. Yuan Zhong found there is a situation that after the pages have been written back, but the checkpoint thread still wait for congestion_wait to exit. So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up. Thanks to Yuan Zhong's pre work about this problem. Reported-by: Yuan Zhong <yuan.mark.zhong@samsung.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18f2fs: introduce function read_raw_super_block()Gu Zheng
Introduce function read_raw_super_block() to hide reading raw super block and the retry routine if the first sb is invalid. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18f2fs: fix the starvation problem on cp_rwsemJaegeuk Kim
This patch removes the logic previously introduced to address the starvation on cp_rwsem. One potential there-in bug is that we should cover the wait.list with spin_lock, but the previous code broke this rule. And, actually current rwsem handles this starvation issue reasonably, so that we didn't need to do this before neither. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18f2fs: fix to store and retrieve i_rdev correctlyJaegeuk Kim
When storing i_rdev, we should check its file type. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-17ext4: fix performance regression in ext4_writepagesMing Lei
Commit 4e7ea81db5(ext4: restructure writeback path) introduces another performance regression on random write: - one more page may be added to ext4 extent in mpage_prepare_extent_to_map, and will be submitted for I/O so nr_to_write will become -1 before 'done' is set - the worse thing is that dirty pages may still be retrieved from page cache after nr_to_write becomes negative, so lots of small chunks can be submitted to block device when page writeback is catching up with write path, and performance is hurted. On one arm A15 board with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB, SATA controller: 3.0Gbps), this patch can improve below test's result from 157MB/sec to 174MB/sec(>10%): dd if=/dev/zero of=./z.img bs=8K count=512K The above test is actually prototype of block write in bonnie++ utility. This patch makes sure no more pages than nr_to_write can be added to extent for mapping, so that nr_to_write won't become negative. Cc: linux-ext4@vger.kernel.org Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-17xfs: don't break from growfs ag update loop on errorEric Sandeen
When xfs_growfs_data_private() is updating backup superblocks, it bails out on the first error encountered, whether reading or writing: * If we get an error writing out the alternate superblocks, * just issue a warning and continue. The real work is * already done and committed. This can cause a problem later during repair, because repair looks at all superblocks, and picks the most prevalent one as correct. If we bail out early in the backup superblock loop, we can end up with more "bad" matching superblocks than good, and a post-growfs repair may revert the filesystem to the old geometry. With the combination of superblock verifiers and old bugs, we're more likely to encounter read errors due to verification. And perhaps even worse, we don't even properly write any of the newly-added superblocks in the new AGs. Even with this change, growfs will still say: xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning data blocks changed from 319815680 to 335216640 which might be confusing to the user, but it at least communicates that something has gone wrong, and dmesg will probably highlight the need for an xfs_repair. And this is still best-effort; if verifiers fail on more than half the backup supers, they may still "win" - but that's probably best left to repair to more gracefully handle by doing its own strict verification as part of the backup super "voting." Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: don't emit corruption noise on fs probesEric Sandeen
If we get EWRONGFS due to probing of non-xfs filesystems, there's no need to issue the scary corruption error and backtrace. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: remove newlines from strings passed to __xfs_printkEric Sandeen
__xfs_printk adds its own "\n". Having it in the original string leads to unintentional blank lines from these messages. Most format strings have no newline, but a few do, leading to i.e.: [ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119911] [ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119919] Fix them all. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: prevent deadlock trying to cover an active logDave Chinner
Recent analysis of a deadlocked XFS filesystem from a kernel crash dump indicated that the filesystem was stuck waiting for log space. The short story of the hang on the RHEL6 kernel is this: - the tail of the log is pinned by an inode - the inode has been pushed by the xfsaild - the inode has been flushed to it's backing buffer and is currently flush locked and hence waiting for backing buffer IO to complete and remove it from the AIL - the backing buffer is marked for write - it is on the delayed write queue - the inode buffer has been modified directly and logged recently due to unlinked inode list modification - the backing buffer is pinned in memory as it is in the active CIL context. - the xfsbufd won't start buffer writeback because it is pinned - xfssyncd won't force the log because it sees the log as needing to be covered and hence wants to issue a dummy transaction to move the log covering state machine along. Hence there is no trigger to force the CIL to the log and hence unpin the inode buffer and therefore complete the inode IO, remove it from the AIL and hence move the tail of the log along, allowing transactions to start again. Mainline kernels also have the same deadlock, though the signature is slightly different - the inode buffer never reaches the delayed write lists because xfs_buf_item_push() sees that it is pinned and hence never adds it to the delayed write list that the xfsaild flushes. There are two possible solutions here. The first is to simply force the log before trying to cover the log and so ensure that the CIL is emptied before we try to reserve space for the dummy transaction in the xfs_log_worker(). While this might work most of the time, it is still racy and is no guarantee that we don't get stuck in xfs_trans_reserve waiting for log space to come free. Hence it's not the best way to solve the problem. The second solution is to modify xfs_log_need_covered() to be aware of the CIL. We only should be attempting to cover the log if there is no current activity in the log - covering the log is the process of ensuring that the head and tail in the log on disk are identical (i.e. the log is clean and at idle). Hence, by definition, if there are items in the CIL then the log is not at idle and so we don't need to attempt to cover it. When we don't need to cover the log because it is active or idle, we issue a log force from xfs_log_worker() - if the log is idle, then this does nothing. However, if the log is active due to there being items in the CIL, it will force the items in the CIL to the log and unpin them. In the case of the above deadlock scenario, instead of xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to cover the log, it will instead force the log, thereby unpinning the inode buffer, allowing IO to be issued and complete and hence removing the inode that was pinning the tail of the log from the AIL. At that point, everything will start moving along again. i.e. the xfs_log_worker turns back into a watchdog that can alleviate deadlocks based around pinned items that prevent the tail of the log from being moved... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-16Merge branch 'akpm' (fixes from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm: revert mremap pud_free anti-fix mm: fix BUG in __split_huge_page_pmd swap: fix set_blocksize race during swapon/swapoff procfs: call default get_unmapped_area on MMU-present architectures procfs: fix unintended truncation of returned mapped address writeback: fix negative bdi max pause percpu_refcount: export symbols fs: buffer: move allocation failure loop into the allocator mm: memcg: handle non-error OOM situations more gracefully tools/testing/selftests: fix uninitialized variable block/partitions/efi.c: treat size mismatch as a warning, not an error mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages mm/zswap: bugfix: memory leak when re-swapon mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages mm: migration: do not lose soft dirty bit if page is in migration state gcov: MAINTAINERS: Add an entry for gcov mm/hugetlb.c: correct missing private flag clearing mm/vmscan.c: don't forget to free shrinker->nr_deferred ipc/sem.c: synchronize semop and semctl with IPC_RMID ipc: update locking scheme comments ...
2013-10-16procfs: call default get_unmapped_area on MMU-present architecturesHATAYAMA Daisuke
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added proc_reg_get_unmapped_area in proc_reg_file_ops and proc_reg_file_ops_no_compat, by which now mmap always returns EIO if get_unmapped_area method is not defined for the target procfs file, which causes regression of mmap on /proc/vmcore. To address this issue, like get_unmapped_area(), call default current->mm->get_unmapped_area on MMU-present architectures if pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation in the procfs file, is not defined. Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16procfs: fix unintended truncation of returned mapped addressHATAYAMA Daisuke
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the mapped virtual address returned from get_unmapped_area method in pde->proc_fops due to the variable rv of signed integer on x86_64. This is too small to have vitual address of unsigned long on x86_64 since on x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes. To fix this issue, use unsigned long instead. Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)"). Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16fs: buffer: move allocation failure loop into the allocatorJohannes Weiner
Buffer allocation has a very crude indefinite loop around waking the flusher threads and performing global NOFS direct reclaim because it can not handle allocation failures. The most immediate problem with this is that the allocation may fail due to a memory cgroup limit, where flushers + direct reclaim might not make any progress towards resolving the situation at all. Because unlike the global case, a memory cgroup may not have any cache at all, only anonymous pages but no swap. This situation will lead to a reclaim livelock with insane IO from waking the flushers and thrashing unrelated filesystem cache in a tight loop. Use __GFP_NOFAIL allocations for buffers for now. This makes sure that any looping happens in the page allocator, which knows how to orchestrate kswapd, direct reclaim, and the flushers sensibly. It also allows memory cgroups to detect allocations that can't handle failure and will allow them to ultimately bypass the limit if reclaim can not make progress. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pagesCyrill Gorcunov
If a page we are inspecting is in swap we may occasionally report it as having soft dirty bit (even if it is clean). The pte_soft_dirty helper should be called on present pte only. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull tmpfile fix from Al Viro: "A fix for double iput() in ->tmpfile() on ext3 and ext4; I'd fucked it up, Miklos has caught it" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext[34]: fix double put in tmpfile
2013-10-16ecryptfs: Fix memory leakage in keystore.cGeyslan G. Bem
In 'decrypt_pki_encrypted_session_key' function: Initializes 'payload' pointer and releases it on exit. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: stable@vger.kernel.org # v2.6.28+
2013-10-16dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSYBart Van Assche
When dlm_release_lockspace(ls, 1) is invoked on a busy system immediately after the last dlm_unlock() AST has finished it can occur that lkb_idr_is_local() is invoked for the unlocked LKB since removal from ls_lkbidr only occurs after the AST has returned. If that happens dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing the lockspace. Fix this race condition by changing lkb_idr_is_local() such that it only returns true for LKB's that have not yet been unlocked. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-10-16ext3: Count journal as bsddf overhead in ext3_statfsEric Sandeen
ext4 counts journal space as bsddf overhead, but ext3 does not. For some reason when I patched ext4 I thought I should leave ext3 alone, but frankly it makes more sense to fix it, I think. Otherwise we get inconsistent behavior from ext3 under ext3.ko, and ext3 under ext4.ko, which is not at all desirable... This is testable by xfstests shared/289, though it will need modification because it currently special-cases ext3. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-10-16ext4: fixup kerndoc annotation of mpage_map_and_submit_extent()Jan Kara
Document give_up_on_write argument of mpage_map_and_submit_extent(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-16ext4: fix assertion in ext4_add_complete_io()Jan Kara
It doesn't make sense to require io_end->handle when we are in nojournal mode. So update the assertion accordingly to avoid false warnings from ext4_add_complete_io(). Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-15ext[34]: fix double put in tmpfileMiklos Szeredi
d_tmpfile() already swallowed the inode ref. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-15GFS2: Use lockref for glocksSteven Whitehouse
Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-14cifs: ntstatus_to_dos_map[] is not terminatedTim Gardner
Functions that walk the ntstatus_to_dos_map[] array could run off the end. For example, ntstatus_to_dos() loops while ntstatus_to_dos_map[].ntstatus is not 0. Granted, this is mostly theoretical, but could be used as a DOS attack if the error code in the SMB header is bogus. [Might consider adding to stable, as this patch is low risk - Steve] Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-14sysfs/bin: Fix size handling overflow for bin_attributeBenjamin Herrenschmidt
While looking at the code, I noticed that bin_attribute read() and write() ops copy the inode size into an int for futher comparisons. Some bin_attributes can be fairly large. For example, pci creates some for BARs set to the BAR size and giant BARs are around the corner, so this is going to break something somewhere eventually. Let's use the right type. [adjust for seqfile conversions, only needed for bin_read() - gkh] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-14sysfs: make sysfs_file_ops() follow ignore_lockdep flagTejun Heo
375b611e60 ("sysfs: remove sysfs_buffer->ops") introduced sysfs_file_ops() which determines the associated file operation of a given sysfs_dirent. As file ops access should be protected by an active reference, the new function includes a lockdep assertion on the sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep flag into account and the lockdep assertion trips spuriously for files which opt out from active reference lockdep checking. # cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized ------------[ cut here ]------------ WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60() Modules linked in: CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000 ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60 ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50 Call Trace: [<ffffffff81ca0131>] dump_stack+0x4e/0x82 [<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60 [<ffffffff8125a274>] sysfs_open_file+0x54/0x300 [<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280 [<ffffffff811df820>] finish_open+0x30/0x40 [<ffffffff811f0623>] do_last+0x503/0xd90 [<ffffffff811f0f6b>] path_openat+0xbb/0x6d0 [<ffffffff811f23ba>] do_filp_open+0x3a/0x90 [<ffffffff811e09a9>] do_sys_open+0x129/0x220 [<ffffffff811e0abe>] SyS_open+0x1e/0x20 [<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b ---[ end trace aa48096b111dafdb ]--- Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep assertion if sysfs_ignore_lockdep() is true. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-14Docs: Kconfig: `devlopers' -> `developers'Michael Witten
Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-12vfs: allow O_PATH file descriptors for fstatfs()Linus Torvalds
Olga reported that file descriptors opened with O_PATH do not work with fstatfs(), found during further development of ksh93's thread support. There is no reason to not allow O_PATH file descriptors here (fstatfs is very much a path operation), so use "fdget_raw()". See commit 55815f70147d ("vfs: make O_PATH file descriptors usable for 'fstat()'") for a very similar issue reported for fstat() by the same team. Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-12Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "A bug fix and performance regression fix for ext4" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix memory leak in xattr ext4: fix performance regression in writeback of random writes
2013-10-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've got more bug fixes in my for-linus branch: One of these fixes another corner of the compression oops from last time. Miao nailed down some problems with concurrent snapshot deletion and drive balancing. I kept out one of his patches for more testing, but these are all stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix oops caused by the space balance and dead roots Btrfs: insert orphan roots into fs radix tree Btrfs: limit delalloc pages outside of find_delalloc_range Btrfs: use right root when checking for hash collision
2013-10-12ext4: fix memory leak in xattrDave Jones
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
2013-10-10aio: Fix a trinity splatKent Overstreet
aio kiocb refcounting was broken - it was relying on keeping track of the number of available ring buffer entries, which it needs to do anyways; then at shutdown time it'd wait for completions to be delivered until the # of available ring buffer entries equalled what it was initialized to. Problem with that is that the ring buffer is mapped writable into userspace, so userspace could futz with the head and tail pointers to cause the kernel to see extra completions, and cause free_ioctx() to return while there were still outstanding kiocbs. Which would be bad. Fix is just to directly refcount the kiocbs - which is more straightforward, and with the new percpu refcounting code doesn't cost us any cacheline bouncing which was the whole point of the original scheme. Also clean up ioctx_alloc()'s error path and fix a bug where it wasn't subtracting from aio_nr if ioctx_add_table() failed. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-10-10Btrfs: fix oops caused by the space balance and dead rootsMiao Xie
When doing space balance and subvolume destroy at the same time, we met the following oops: kernel BUG at fs/btrfs/relocation.c:2247! RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs] Call Trace: [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs] [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs] [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs] [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs] [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs] [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs] [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs] [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs] [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs] [<ffffffff81122f53>] ? path_openat+0x234/0x4db [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b It is because we returned the error number if the reference of the root was 0 when doing space relocation. It was not right here, because though the root was dead(refs == 0), but the space it held still need be relocated, or we could not remove the block group. So in this case, we should return the root no matter it is dead or not. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10Btrfs: insert orphan roots into fs radix treeMiao Xie
Now we don't drop all the deleted snapshots/subvolumes before the space balance. It means we have to relocate the space which is held by the dead snapshots/subvolumes. So we must into them into fs radix tree, or we would forget to commit the change of them when doing transaction commit, and it would corrupt the metadata. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10Btrfs: limit delalloc pages outside of find_delalloc_rangeJosef Bacik
Liu fixed part of this problem and unfortunately I steered him in slightly the wrong direction and so didn't completely fix the problem. The problem is we limit the size of the delalloc range we are looking for to max bytes and then we try to lock that range. If we fail to lock the pages in that range we will shrink the max bytes to a single page and re loop. However if our first page is inside of the delalloc range then we will end up limiting the end of the range to a period before our first page. This is illustrated below [0 -------- delalloc range --------- 256mb] [page] So find_delalloc_range will return with delalloc_start as 0 and end as 128mb, and then we will notice that delalloc_start < *start and adjust it up, but not adjust delalloc_end up, so things go sideways. To fix this we need to not limit the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that way we don't end up with this confusion. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10Btrfs: use right root when checking for hash collisionJosef Bacik
btrfs_rename was using the root of the old dir instead of the root of the new dir when checking for a hash collision, so if you tried to move a file into a subvol it would freak out because it would see the file you are trying to move in its current root. This fixes the bug where this would fail btrfs subvol create test1 btrfs subvol create test2 mv test1 test2. Thanks to Chris Murphy for catching this, Cc: stable@vger.kernel.org Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-09of: remove HAVE_ARCH_DEVTREE_FIXUPSRob Herring
HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc, but it is only used for /proc/device-teee and sparc does not enable /proc/device-tree. So this option is redundant. Remove the option and always enable it. This has the side effect of fixing /proc/device-tree on arches such as arm64 which failed to define this option. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com>
2013-10-09sched/numa: Call task_numa_free() from do_execve()Rik van Riel
It is possible for a task in a numa group to call exec, and have the new (unrelated) executable inherit the numa group association from its former self. This has the potential to break numa grouping, and is trivial to fix. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Report a NUMA task group IDMel Gorman
It is desirable to model from userspace how the scheduler groups tasks over time. This patch adds an ID to the numa_group and reports it via /proc/PID/status. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-08xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHEBrian Foster
The xfs_inactive() return value is meaningless. Turn xfs_inactive() into a void function and clean up the error handling appropriately. Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant to Linux. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: push down inactive transaction mgmt for ifreeBrian Foster
Push the inode free work performed during xfs_inactive() down into a new xfs_inactive_ifree() helper. This clears xfs_inactive() from all inode locking and transaction management more directly associated with freeing the inode xattrs, extents and the inode itself. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: push down inactive transaction mgmt for truncateBrian Foster
Create the new xfs_inactive_truncate() function to handle the truncate portion of xfs_inactive(). Push the locking and transaction management into the new function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: push down inactive transaction mgmt for remote symlinksBrian Foster
Push down the transaction management for remote symlinks from xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is cleaned up to avoid transaction management intended for the calling context (i.e., trans duplication, reservation, item attachment). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: add the inode directory type support to XFS_IOC_FSGEOMMark Tinguely
Add the inode type directory type support to XFS_IOC_FSGEOM so that xfs_repair/xfs_info knows if the superblock v4 filesystem enabled the feature. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08f2fs: fix writing incorrect orphan blocksJaegeuk Kim
Previously, there was a erroneous scenario like below. thread 1: thread 2: f2fs_unlink - acquire_orphan_inode : sbi->n_orphans++ write_checkpoint - block_operations : f2fs_lock_all - do_checkpoint : write orphan blocks with sbi->n_orphans - unblock_operations - f2fs_lock_op - release_orphan_inode - f2fs_unlock_op During the checkpoint by thread 2, f2fs stores a wrong orphan block according to the wrong sbi->n_orphans. To avoid this, simply we should make cover acquire_orphan_inode too with f2fs_lock_op. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-08f2fs: avoid unnecessary checkpointsJaegeuk Kim
During the f2fs_put_super procedure, we don't need to conduct checkpoint all the time, since we don't need to do that if superblock is clean. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-07cifs: Allow LANMAN auth method for servers supporting unencapsulated ↵Sachin Prabhu
authentication methods This allows users to use LANMAN authentication on servers which support unencapsulated authentication. The patch fixes a regression where users using plaintext authentication were no longer able to do so because of changed bought in by patch 3f618223dc0bdcbc8d510350e78ee2195ff93768 https://bugzilla.redhat.com/show_bug.cgi?id=1011621 Reported-by: Panos Kavalagios <Panagiotis.Kavalagios@eurodyn.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-07cifs: Fix inability to write files >2GB to SMB2/3 sharesJan Klos
When connecting to SMB2/3 shares, maximum file size is set to non-LFS maximum in superblock. This is due to cap_large_files bit being different for SMB1 and SMB2/3 (where it is just an internal flag that is not negotiated and the SMB1 one corresponds to multichannel capability, so maybe LFS works correctly if server sends 0x08 flag) while capabilities are checked always for the SMB1 bit in cifs_read_super(). The patch fixes this by checking for the correct bit according to the protocol version. CC: Stable <stable@kernel.org> Signed-off-by: Jan Klos <honza.klos@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>