summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-02-20Revert "Btrfs: fix permissions of empty files not affected by umask"Josef Bacik
This reverts commit 2794ed013b3551cbae887ea1b93c52aaacb7370d. Wasn't supposed to get used in btrfs_mknod, it was supposed to be in btrfs_create, which was done in commit 9185aa587b7425f8f4520da2e66792f5f3c2b815. Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: don't traverse the ordered operation list repeatedlyMiao Xie
btrfs_run_ordered_operations() needn't traverse the ordered operation list repeatedly, it is because the transaction commiter will invoke it again when there is no other writer in this transaction, it can ensure that no one can add new objects into the ordered operation list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: traverse and flush the delalloc inodes onceMiao Xie
btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes repeatedly. It is because we can regard the data that the users write after we start delalloc inodes flush as the one which is after the delalloc inodes flush is done, and we can flush it next time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: check the return value of btrfs_run_ordered_operations()Miao Xie
We forget to check the return value of btrfs_run_ordered_operations() when flushing all the pending stuffs, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: check the return value of btrfs_start_delalloc_inodes()Miao Xie
We forget to check the return value of btrfs_start_delalloc_inodes(), fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: make raid attr array more readableMiao Xie
The current code of raid attr arry is hard to understand and it is easy to introduce some problem if we modify the array. So I changed it and made it more readable. Cc: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: record first logical byte in memoryLiu Bo
This'd save us a rbtree search which may become expensive in large filesystem. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: save us a read_lockLiu Bo
This does not change the logic of code, but can save us a read_lock. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use token to avoid times mapping extent bufferLiu Bo
The API in tree log code has done sort of changes, and it proves that we can benifit from using token, so do the same thing here. function_graph tracer's timer shows that it costs nearly half time of before(39.788us -> 22.391us). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: kill unused argument of btrfs_pin_extent_for_log_replayLiu Bo
Argument 'trans' is not used any more. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: kill unused argument of update_block_groupLiu Bo
Argument 'trans' is not used any more. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: kill unused arguments of cache_block_groupLiu Bo
Argument 'trans' and 'root' are not used any more. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: remove deprecated commentsLiu Bo
commit d53ba47484ed6245e640ee4bfe9d21e9bfc15765 (Btrfs: use commit root when loading free space cache) has remove the deadlock check, and the related comments can be removed as well. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: don't re-enter when allocating a chunkJosef Bacik
If we start running low on metadata space we will try to allocate a chunk, which could then try to allocate a chunk to add the device entry. The thing is we allocate a chunk before we try really hard to make the allocation, so we should be able to find space for the device entry. Add a flag to the trans handle so we know we're currently allocating a chunk so we can just bail out if we try to allocate another chunk. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: wait on ordered extents at the last possible momentJosef Bacik
Since we don't actually copy the extent information from the source tree in the fast case we don't need to wait for ordered io to be completed in order to fsync, we just need to wait for the io to be completed. So when we're logging our file just attach all of the ordered extents to the log, and then when the log syncs just wait for IO_DONE on the ordered extents and then write the super. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix trivial error in btrfs_ioctl_resize()Miao Xie
This patch fixes the following problem: - improper return value - unnecessary read-only check Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use wrapper page_offsetMiao Xie
Use wrapper page_offset to get byte-offset into filesystem object for page. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: flush all dirty inodes if writeback can not startMiao Xie
We may try to flush some dirty pages when there is no enough space to reserve. But it is possible that this operation fails, in order to get enough space to reserve successfully, we will sync all the delalloc file. This operation is safe, we needn't worry about the case that the filesystem goes from r/w to r/o. because the filesystem should guarantee all the dirty pages have been written into the disk after it becomes readonly, so the sync operation will do nothing if the filesystem is already readonly. Though it may waste lots of time, as a corner case, we needn't care. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: make delayed ref lock logic more readableMiao Xie
Locking and unlocking delayed ref mutex are in the different functions, and the name of lock functions is not uniform, so the readability is not so good, this patch optimizes the lock logic and makes it more readable. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix lots of orphan inodes when the space is not enoughMiao Xie
We're running into having 50-100 orphans left over with xfstests 83 because of ENOSPC when trying to start the transaction for the inode update. But in fact, it makes no sense in updating the inode for the new size while we're deleting the stupid thing. This patch fixes this problem. Reported-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: cleanup similar code in delayed inodeMiao Xie
The delayed item commit code in several functions is similar, so cleanup it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use common work instead of delayed workMiao Xie
Since we do not want to delay the async transaction commit, we should use common work, not delayed work. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-02-20Btrfs: cleanup unnecessary clear when freeing a transaction or a trans handleMiao Xie
We clear the transaction object and the trans handle when they are about to be freed, it is unnecessary, cleanup it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-02-20Btrfs: use slabs for delayed reference allocationMiao Xie
The delayed reference allocation is in the fast path of the IO, so use slabs to improve the speed of the allocation. And besides that, it can do check for leaked objects when the module is removed. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-02-19Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Ingo Molnar: "Main changes: - ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to keep the hardware clock updated. - posix-timers: Fix clock_adjtime to always return timex data on success. This is changing the ABI, but no breakage was expected and found - caution is warranted nevertheless. - platform persistent clock improvements/cleanups. - clockevents: refactor timer broadcast handling to be more generic and less duplicated with matching architecture code (mostly ARM motivated.) - various fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet() posix-cpu-timers: Fix nanosleep task_struct leak clockevents: Fix generic broadcast for FEAT_C3STOP time, Fix setting of hardware clock in NTP code hrtimer: Prevent hrtimer_enqueue_reprogram race clockevents: Add generic timer broadcast function clockevents: Add generic timer broadcast receiver timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK x86/time/rtc: Don't print extended CMOS year when reading RTC x86: Select HAS_PERSISTENT_CLOCK on x86 timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option rtc: Skip the suspend/resume handling if persistent clock exist timekeeping: Add persistent_clock_exist flag posix-timers: Fix clock_adjtime to always return timex data on success Round the calculated scale factor in set_cyc2ns_scale() NTP: Add a CONFIG_RTC_SYSTOHC configuration MAINTAINERS: Update John Stultz's email time: create __getnstimeofday for WARNless calls
2013-02-19Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Main changes: - scheduler side full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker. - Initial sched.h split-up changes, by Clark Williams - select_idle_sibling() performance improvement by Mike Galbraith: " 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs " - sched_rr_get_interval() ABI fix/change. We think this detail is not used by apps (so it's not an ABI in practice), but lets keep it under observation. - misc RT scheduling cleanups, optimizations" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h> cputime: Remove irqsave from seqlock readers sched, powerpc: Fix sched.h split-up build failure cputime: Restore CPU_ACCOUNTING config defaults for PPC64 sched/rt: Move rt specific bits into new header file sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice sched: Move sched.h sysctl bits into separate header sched: Fix signedness bug in yield_to() sched: Fix select_idle_sibling() bouncing cow syndrome sched/rt: Further simplify pick_rt_task() sched/rt: Do not account zero delta_exec in update_curr_rt() cputime: Safely read cputime of full dynticks CPUs kvm: Prepare to add generic guest entry/exit callbacks cputime: Use accessors to read task cputime stats cputime: Allow dynamic switch between tick/virtual based cputime accounting cputime: Generic on-demand virtual cputime accounting cputime: Move default nsecs_to_cputime() to jiffies based cputime file cputime: Librarize per nsecs resolution cputime definitions cputime: Avoid multiplication overflow on utime scaling context_tracking: Export context state for generic vtime ... Fix up conflict in kernel/context_tracking.c due to comment additions.
2013-02-19Merge branch 'testing' of github.com:ceph/ceph-client into into linux-3.8-cephAlex Elder
2013-02-19ceph: remove a few bogus declarationsAlex Elder
There are three ceph page vector functions declared in "fs/ceph/super.h" that don't belong there. They're probably left over from some long-ago code reorganization. They're properly declared in "include/linux/ceph/libceph.h" so just delete the ones in "super.h". This and the next few commits resolve: http://tracker.ceph.com/issues/4053 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-19NLM: Ensure that we resend all pending blocking locks after a reclaimTrond Myklebust
Currently, nlmclnt_lock will break out of the for(;;) loop when the reclaimer wakes up the blocking lock thread by setting nlm_lck_denied_grace_period. This causes the lock request to fail with an ENOLCK error. The intention was always to ensure that we resend the lock request after the grace period has expired. Reported-by: Wangyuan Zhang <Wangyuan.Zhang@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-02-19locking: Various static lock initializer fixesThomas Gleixner
The static lock initializers want to be fed the proper name of the lock and not some random string. In mainline random strings are obfuscating the readability of debug output, but for RT they prevent the spinlock substitution. Fix it up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-18net: proc: remove proc_net_removeGao feng
proc_net_remove has been replaced by remove_proc_entry. we can remove it now. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18net: proc: remove proc_net_fops_createGao feng
proc_net_fops_create has been replaced by proc_create, we can remove it now. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18libceph: update ceph_mds_state_name() and ceph_mds_op_name()Alex Elder
Update ceph_mds_state_name() and ceph_mds_op_name() to include the newly-added definitions in "ceph_fs.h", and to match its counterpart in the user space code. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_new_request() "num_reply" parameterAlex Elder
The "num_reply" parameter to ceph_osdc_new_request() is never used inside that function, so get rid of it. Note that ceph_sync_write() passes 2 for that argument, while all other callers pass 1. It doesn't matter, but perhaps someone should verify this doesn't indicate a problem. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_writepages() "flags" parameterAlex Elder
There is only one caller of ceph_osdc_writepages(), and it always passes 0 as its "flags" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with 0. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_writepages() "dosync" parameterAlex Elder
There is only one caller of ceph_osdc_writepages(), and it always passes 0 as its "dosync" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with 0. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_writepages() "nofail" parameterAlex Elder
There is only one caller of ceph_osdc_writepages(), and it always passes the value true as its "nofail" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with the constant value true. This and a number of cleanup patches that follow resolve: http://tracker.ceph.com/issues/4126 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ext4: fix xattr block allocation/release with bigallocLukas Czerner
Currently when new xattr block is created or released we we would call dquot_free_block() or dquot_alloc_block() respectively, among the else decrementing or incrementing the number of blocks assigned to the inode by one block. This however does not work for bigalloc file system because we always allocate/free the whole cluster so we have to count with that in dquot_free_block() and dquot_alloc_block() as well. Use the clusters-to-blocks conversion EXT4_C2B() when passing number of blocks to the dquot_alloc/free functions to fix the problem. The problem has been revealed by xfstests #117 (and possibly others). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
2013-02-18ext4: reclaim extents from extent status treeZheng Liu
Although extent status is loaded on-demand, we also need to reclaim extent from the tree when we are under a heavy memory pressure because in some cases fragmented extent tree causes status tree costs too much memory. Here we maintain a lru list in super_block. When the extent status of an inode is accessed and changed, this inode will be move to the tail of the list. The inode will be dropped from this list when it is cleared. In the inode, a counter is added to count the number of cached objects in extent status tree. Here only written/unwritten/hole extent is counted because delayed extent doesn't be reclaimed due to fiemap, bigalloc and seek_data/hole need it. The counter will be increased as a new extent is allocated, and it will be decreased as a extent is freed. In this commit we use normal shrinker framework to reclaim memory from the status tree. ext4_es_reclaim_extents_count() traverses the lru list to count the number of reclaimable extents. ext4_es_shrink() tries to reclaim written/unwritten/hole extents from extent status tree. The inode that has been shrunk is moved to the tail of lru list. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: adjust some functions for reclaiming extents from extent status treeZheng Liu
This commit changes some interfaces in extent status tree because we need to use inode to count the cached objects in a extent status tree. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: remove single extent cacheZheng Liu
Single extent cache could be removed because we have extent status tree as a extent cache, and it would be better. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: lookup block mapping in extent status treeZheng Liu
After tracking all extent status, we already have a extent cache in memory. Every time we want to lookup a block mapping, we can first try to lookup it in extent status tree to avoid a potential disk I/O. A new function called ext4_es_lookup_extent is defined to finish this work. When we try to lookup a block mapping, we always call ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we first try to lookup a block mapping in extent status tree. A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks in order not to put a hole into extent status tree because this hole will be converted to delayed extent in the tree immediately. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: track all extent status in extent status treeZheng Liu
By recording the phycisal block and status, extent status tree is able to track the status of every extents. When we call _map_blocks functions to lookup an extent or create a new written/unwritten/delayed extent, this extent will be inserted into extent status tree. We don't load all extents from disk in alloc_inode() because it costs too much memory, and if a file is opened and closed frequently it will takes too much time to load all extent information. So currently when we create/lookup an extent, this extent will be inserted into extent status tree. Hence, the extent status tree may not comprehensively contain all of the extents found in the file. Here a condition we need to take care is that an extent might contains unwritten and delayed status simultaneously because an extent is delayed allocated and could be allocated by fallocate. At this time we need to keep delayed status because later we need to update delayed reservation space using it. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flagZheng Liu
This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag because in later commit ext4_map_blocks needs to use this flag to determine the extent status. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-18ext4: rename and improbe ext4_es_find_extent()Zheng Liu
This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent and improve this function. First, we split input and output parameter. Second, this function never return the first block of the next delayed extent after 'es'. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: add physical block and status member into extent status treeZheng Liu
This commit adds two members in extent_status structure to let it record physical block and extent status. Here es_pblk is used to record both of them because physical block only has 48 bits. So extent status could be stashed into it so that we can save some memory. Now written, unwritten, delayed and hole are defined as status. Due to new member is added into extent status tree, all interfaces need to be adjusted. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-18ext4: refine extent status treeZheng Liu
This commit refines the extent status tree code. 1) A prefix 'es_' is added to to the extent status tree structure members. 2) Refactored es_remove_extent() so that __es_remove_extent() can be used by es_insert_extent() to remove the old extent entry(-ies) before inserting a new one. 3) Rename extent_status_end() to ext4_es_end() 4) ext4_es_can_be_merged() is define to check whether two extents can be merged or not. 5) Update and clarified comments. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-17umount oops when remove blocklayoutdriver firstfanchaoting
now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-02-17nfs: remove kfree() redundant null checksTim Gardner
smatch analysis: fs/nfs/getroot.c:130 nfs_get_root() info: redundant null check on name calling kfree() fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null check on devname_garbage calling kfree() Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-17NFSv4.1: Don't decode skipped layoutgetsWeston Andros Adamson
layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org