summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-10-08btrfs: switch more printks to our helpersDavid Sterba
Convert the simple cases, not all functions provide a way to reach the fs_info. Also skipped debugging messages (print-tree, integrity checker and pr_debug) and messages that are printed from possibly unfinished mount. Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: switch message printers to ratelimited variantsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: introduce ratelimited variants of message printing functionsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: switch message printers to ratelimited _in_rcu variantsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: introduce ratelimited _in_rcu variants of message printing functionsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: switch message printers to _in_rcu variantsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: introduce _in_rcu variants of message printing functionsDavid Sterba
Due to the missing variants there are messages that lack the information printed by btrfs_info etc helpers. Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-07Merge tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Bugfixes: - Fix a use-after-free bug in the RPC/RDMA client - Fix a write performance regression - Fix up page writeback accounting - Don't try to reclaim unused state owners - Fix a NFSv4 nograce recovery hang - reset states to use open_stateid when returning delegation voluntarily - Fix a tracepoint NULL-pointer dereference" * tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix a tracepoint NULL-pointer dereference nfs4: reset states to use open_stateid when returning delegation voluntarily NFSv4: Fix a nograce recovery hang NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH NFSv4: Don't try to reclaim unused state owners NFS: Fix a write performance regression NFS: Fix up page writeback accounting xprtrdma: disconnect and flush cqs before freeing buffers
2015-10-06NFS: Fix a tracepoint NULL-pointer dereferenceAnna Schumaker
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file enabled produces a NULL-pointer dereference when calculating fileid and filehandle of the opened file. Fix this by checking if state is NULL before trying to use the inode pointer. Reported-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-06BTRFS: support NFSv2 exportNeilBrown
The "fh_len" passed to ->fh_to_* is not guaranteed to be that same as that returned by encode_fh - it may be larger. With NFSv2, the filehandle is fixed length, so it may appear longer than expected and be zero-padded. So we must test that fh_len is at least some value, not exactly equal to it. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Sterba <dsterba@suse.cz>
2015-10-06Btrfs: open_ctree: Fix possible memory leakchandan
After reading one of chunk or tree root tree's root node from disk, if the root node does not have EXTENT_BUFFER_UPTODATE flag set, we fail to release the memory used by the root node. Fix this. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
2015-10-06Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Two fixes for problems pointed out by automated tools. Thanks PaX/grsecurity team and Dan Carpenter (and the Smatch tool)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] Update cifs version number [SMB3] Do not fall back to SMBWriteX in set_file_size error cases [SMB3] Missing null tcon check
2015-10-05Btrfs: fix deadlock when finalizing block group creationFilipe Manana
Josef ran into a deadlock while a transaction handle was finalizing the creation of its block groups, which produced the following trace: [260445.593112] fio D ffff88022a9df468 0 8924 4518 0x00000084 [260445.593119] ffff88022a9df468 ffffffff81c134c0 ffff880429693c00 ffff88022a9df488 [260445.593126] ffff88022a9e0000 ffff8803490d7b00 ffff8803490d7b18 ffff88022a9df4b0 [260445.593132] ffff8803490d7af8 ffff88022a9df488 ffffffff8175a437 ffff8803490d7b00 [260445.593137] Call Trace: [260445.593145] [<ffffffff8175a437>] schedule+0x37/0x80 [260445.593189] [<ffffffffa0850f37>] btrfs_tree_lock+0xa7/0x1f0 [btrfs] [260445.593197] [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0 [260445.593225] [<ffffffffa07eac44>] btrfs_lock_root_node+0x34/0x50 [btrfs] [260445.593253] [<ffffffffa07eff6b>] btrfs_search_slot+0x88b/0xa00 [btrfs] [260445.593295] [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs] [260445.593324] [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [260445.593351] [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [260445.593394] [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs] [260445.593427] [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs] [260445.593459] [<ffffffffa0800964>] do_chunk_alloc+0x2a4/0x2e0 [btrfs] [260445.593491] [<ffffffffa0803815>] find_free_extent+0xa55/0xd90 [btrfs] [260445.593524] [<ffffffffa0803c22>] btrfs_reserve_extent+0xd2/0x220 [btrfs] [260445.593532] [<ffffffff8119fe5d>] ? account_page_dirtied+0xdd/0x170 [260445.593564] [<ffffffffa0803e78>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs] [260445.593597] [<ffffffffa080c9de>] ? btree_set_page_dirty+0xe/0x10 [btrfs] [260445.593626] [<ffffffffa07eb5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs] [260445.593654] [<ffffffffa07ebbff>] btrfs_cow_block+0x11f/0x1c0 [btrfs] [260445.593682] [<ffffffffa07ef8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs] [260445.593724] [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs] [260445.593752] [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [260445.593830] [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [260445.593905] [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs] [260445.593946] [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs] [260445.593990] [<ffffffffa0815798>] btrfs_commit_transaction+0xa8/0xb40 [btrfs] [260445.594042] [<ffffffffa085abcd>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs] [260445.594089] [<ffffffffa082bc84>] btrfs_sync_file+0x294/0x350 [btrfs] [260445.594115] [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0 [260445.594133] [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180 [260445.594149] [<ffffffff8123e35d>] do_fsync+0x3d/0x70 [260445.594169] [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110 [260445.594187] [<ffffffff8123e600>] SyS_fsync+0x10/0x20 [260445.594204] [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71 This happened because the same transaction handle created a large number of block groups and while finalizing their creation (inserting new items and updating existing items in the chunk and device trees) a new metadata extent had to be allocated and no free space was found in the current metadata block groups, which made find_free_extent() attempt to allocate a new block group via do_chunk_alloc(). However at do_chunk_alloc() we ended up allocating a new system chunk too and exceeded the threshold of 2Mb of reserved chunk bytes, which makes do_chunk_alloc() enter the final part of block group creation again (at btrfs_create_pending_block_groups()) and attempt to lock again the root of the chunk tree when it's already write locked by the same task. Similarly we can deadlock on extent tree nodes/leafs if while we are running delayed references we end up creating a new metadata block group in order to allocate a new node/leaf for the extent tree (as part of a CoW operation or growing the tree), as btrfs_create_pending_block_groups inserts items into the extent tree as well. In this case we get the following trace: [14242.773581] fio D ffff880428ca3418 0 3615 3100 0x00000084 [14242.773588] ffff880428ca3418 ffff88042d66b000 ffff88042a03c800 ffff880428ca3438 [14242.773594] ffff880428ca4000 ffff8803e4b20190 ffff8803e4b201a8 ffff880428ca3460 [14242.773600] ffff8803e4b20188 ffff880428ca3438 ffffffff8175a437 ffff8803e4b20190 [14242.773606] Call Trace: [14242.773613] [<ffffffff8175a437>] schedule+0x37/0x80 [14242.773656] [<ffffffffa057ff07>] btrfs_tree_lock+0xa7/0x1f0 [btrfs] [14242.773664] [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0 [14242.773692] [<ffffffffa0519c44>] btrfs_lock_root_node+0x34/0x50 [btrfs] [14242.773720] [<ffffffffa051ef6b>] btrfs_search_slot+0x88b/0xa00 [btrfs] [14242.773750] [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [14242.773758] [<ffffffff811ef4a2>] ? kmem_cache_alloc+0x1d2/0x200 [14242.773786] [<ffffffffa0520ad1>] btrfs_insert_item+0x71/0xf0 [btrfs] [14242.773818] [<ffffffffa052f292>] btrfs_create_pending_block_groups+0x102/0x200 [btrfs] [14242.773850] [<ffffffffa052f96e>] do_chunk_alloc+0x2ae/0x2f0 [btrfs] [14242.773934] [<ffffffffa0532825>] find_free_extent+0xa55/0xd90 [btrfs] [14242.773998] [<ffffffffa0532c22>] btrfs_reserve_extent+0xc2/0x1d0 [btrfs] [14242.774041] [<ffffffffa0532e38>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs] [14242.774078] [<ffffffffa051a5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs] [14242.774118] [<ffffffffa051abff>] btrfs_cow_block+0x11f/0x1c0 [btrfs] [14242.774155] [<ffffffffa051e8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs] [14242.774194] [<ffffffffa0528021>] ? __btrfs_free_extent.isra.70+0x2e1/0xcb0 [btrfs] [14242.774235] [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [14242.774274] [<ffffffffa051994a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [14242.774318] [<ffffffffa052c433>] __btrfs_run_delayed_refs+0xbb3/0x1020 [btrfs] [14242.774358] [<ffffffffa052f404>] btrfs_run_delayed_refs.part.78+0x74/0x280 [btrfs] [14242.774391] [<ffffffffa052f627>] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [14242.774432] [<ffffffffa05be236>] commit_cowonly_roots+0x8d/0x2bd [btrfs] [14242.774474] [<ffffffffa059d07f>] ? __btrfs_run_delayed_items+0x1cf/0x210 [btrfs] [14242.774516] [<ffffffffa05adac3>] ? btrfs_qgroup_account_extents+0x83/0x130 [btrfs] [14242.774558] [<ffffffffa0544c40>] btrfs_commit_transaction+0x590/0xb40 [btrfs] [14242.774599] [<ffffffffa0589b9d>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs] [14242.774642] [<ffffffffa055ac54>] btrfs_sync_file+0x294/0x350 [btrfs] [14242.774650] [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0 [14242.774657] [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180 [14242.774663] [<ffffffff8123e35d>] do_fsync+0x3d/0x70 [14242.774669] [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110 [14242.774675] [<ffffffff8123e600>] SyS_fsync+0x10/0x20 [14242.774681] [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71 Fix this by never recursing into the finalization phase of block group creation and making sure we never trigger the finalization of block group creation while running delayed references. Reported-by: Josef Bacik <jbacik@fb.com> Fixes: 00d80e342c0f ("Btrfs: fix quick exhaustion of the system array in the superblock") Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-10-05Btrfs: update fix for read corruption of compressed and shared extentsFilipe Manana
My previous fix in commit 005efedf2c7d ("Btrfs: fix read corruption of compressed and shared extents") was effective only if the compressed extents cover a file range with a length that is not a multiple of 16 pages. That's because the detection of when we reached a different range of the file that shares the same compressed extent as the previously processed range was done at extent_io.c:__do_contiguous_readpages(), which covers subranges with a length up to 16 pages, because extent_readpages() groups the pages in clusters no larger than 16 pages. So fix this by tracking the start of the previously processed file range's extent map at extent_readpages(). The following test case for fstests reproduces the issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner rm -f $seqres.full test_clone_and_read_compressed_extent() { local mount_opts=$1 _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount $mount_opts # Create our test file with a single extent of 64Kb that is going to # be compressed no matter which compression algo is used (zlib/lzo). $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \ $SCRATCH_MNT/foo | _filter_xfs_io # Now clone the compressed extent into an adjacent file offset. $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo echo "File digest before unmount:" md5sum $SCRATCH_MNT/foo | _filter_scratch # Remount the fs or clear the page cache to trigger the bug in # btrfs. Because the extent has an uncompressed length that is a # multiple of 16 pages, all the pages belonging to the second range # of the file (64K to 128K), which points to the same extent as the # first range (0K to 64K), had their contents full of zeroes instead # of the byte 0xaa. This was a bug exclusively in the read path of # compressed extents, the correct data was stored on disk, btrfs # just failed to fill in the pages correctly. _scratch_remount echo "File digest after remount:" # Must match the digest we got before. md5sum $SCRATCH_MNT/foo | _filter_scratch } echo -e "\nTesting with zlib compression..." test_clone_and_read_compressed_extent "-o compress=zlib" _scratch_unmount echo -e "\nTesting with lzo compression..." test_clone_and_read_compressed_extent "-o compress=lzo" status=0 exit Cc: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: Timofey Titovets <nefelim4ag@gmail.com>
2015-10-05Btrfs: send, fix corner case for reference overwrite detectionFilipe Manana
When the inode given to did_overwrite_ref() matches the current progress and has a reference that collides with the reference of other inode that has the same number as the current progress, we were always telling our caller that the inode's reference was overwritten, which is incorrect because the other inode might be a new inode (different generation number) in which case we must return false from did_overwrite_ref() so that its callers don't use an orphanized path for the inode (as it will never be orphanized, instead it will be unlinked and the new inode created later). The following test case for fstests reproduces the issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -fr $send_files_dir rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _need_to_be_root send_files_dir=$TEST_DIR/btrfs-test-$seq rm -f $seqres.full rm -fr $send_files_dir mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount # Create our test file with a single extent of 64K. mkdir -p $SCRATCH_MNT/foo $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" $SCRATCH_MNT/foo/bar \ | _filter_xfs_io _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/mysnap1 _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \ $SCRATCH_MNT/mysnap2 echo "File digest before being replaced:" md5sum $SCRATCH_MNT/mysnap1/foo/bar | _filter_scratch # Remove the file and then create a new one in the same location with # the same name but with different content. This new file ends up # getting the same inode number as the previous one, because that inode # number was the highest inode number used by the snapshot's root and # therefore when attempting to find the a new inode number for the new # file, we end up reusing the same inode number. This happens because # currently btrfs uses the highest inode number summed by 1 for the # first inode created once a snapshot's root is loaded (done at # fs/btrfs/inode-map.c:btrfs_find_free_objectid in the linux kernel # tree). # Having these two different files in the snapshots with the same inode # number (but different generation numbers) caused the btrfs send code # to emit an incorrect path for the file when issuing an unlink # operation because it failed to realize they were different files. rm -f $SCRATCH_MNT/mysnap2/foo/bar $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 96K" \ $SCRATCH_MNT/mysnap2/foo/bar | _filter_xfs_io _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/mysnap2 \ $SCRATCH_MNT/mysnap2_ro _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 \ $SCRATCH_MNT/mysnap2_ro -f $send_files_dir/2.snap echo "File digest in the original filesystem after being replaced:" md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch # Now recreate the filesystem by receiving both send streams and verify # we get the same file contents that the original filesystem had. _scratch_unmount _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/1.snap _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/2.snap echo "File digest in the new filesystem:" # Must match the digest from the new file. md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch status=0 exit Reported-by: Martin Raiber <martin@urbackup.org> Fixes: 8b191a684968 ("Btrfs: incremental send, check if orphanized dir inode needs delayed rename") Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-10-03[CIFS] Update cifs version numberSteve French
Update modinfo cifs.ko version number to 2.08 Signed-off-by: Steve French <steve.french@primarydata.com>
2015-10-02nfs4: reset states to use open_stateid when returning delegation voluntarilyJeff Layton
When the client goes to return a delegation, it should always update any nfs4_state currently set up to use that delegation stateid to instead use the open stateid. It already does do this in some cases, particularly in the state recovery code, but not currently when the delegation is voluntarily returned (e.g. in advance of a RENAME). This causes the client to try to continue using the delegation stateid after the DELEGRETURN, e.g. in LAYOUTGET. Set the nfs4_state back to using the open stateid in nfs4_open_delegation_recall, just before clearing the NFS_DELEGATED_STATE bit. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4: Fix a nograce recovery hangBenjamin Coddington
Since commit 5cae02f42793130e1387f4ec09c4d07056ce9fa5 an OPEN_CONFIRM should have a privileged sequence in the recovery case to allow nograce recovery to proceed for NFSv4.0. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FHTrond Myklebust
We need to warn against broken NFSv4.1 servers that try to hand out delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4: Don't try to reclaim unused state ownersTrond Myklebust
Currently, we don't test if the state owner is in use before we try to recover it. The problem is that if the refcount is zero, then the state owner will be waiting on the lru list for garbage collection. The expectation in that case is that if you bump the refcount, then you must also remove the state owner from the lru list. Otherwise the call to nfs4_put_state_owner will corrupt that list by trying to add our state owner a second time. Avoid the whole problem by just skipping state owners that hold no state. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFS: Fix a write performance regressionTrond Myklebust
If all other conditions in nfs_can_extend_write() are met, and there are no locks, then we should be able to assume close-to-open semantics and the ability to extend our write to cover the whole page. With this patch, the xfstests generic/074 test completes in 242s instead of >1400s on my test rig. Fixes: bd61e0a9c852 ("locks: convert posix locks to file_lock_context") Cc: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFS: Fix up page writeback accountingTrond Myklebust
Currently, we are crediting all the calls to nfs_writepages_callback() (i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from being inconsistent with the behaviour of the equivalent readpage/readpages accounting, this also means that we cannot distinguish between bulk writes and single page writebacks (which confuses the 'nfsiostat -p' tool). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-01[SMB3] Do not fall back to SMBWriteX in set_file_size error casesSteve French
The error paths in set_file_size for cifs and smb3 are incorrect. In the unlikely event that a server did not support set file info of the file size, the code incorrectly falls back to trying SMBWriteX (note that only the original core SMB Write, used for example by DOS, can set the file size this way - this actually does not work for the more recent SMBWriteX). The idea was since the old DOS SMB Write could set the file size if you write zero bytes at that offset then use that if server rejects the normal set file info call. Fortunately the SMBWriteX will never be sent on the wire (except when file size is zero) since the length and offset fields were reversed in the two places in this function that call SMBWriteX causing the fall back path to return an error. It is also important to never call an SMB request from an SMB2/sMB3 session (which theoretically would be possible, and can cause a brief session drop, although the client recovers) so this should be fixed. In practice this path does not happen with modern servers but the error fall back to SMBWriteX is clearly wrong. Removing the calls to SMBWriteX in the error paths in cifs_set_file_size Pointed out by PaX/grsecurity team Signed-off-by: Steve French <steve.french@primarydata.com> Reported-by: PaX Team <pageexec@freemail.hu> CC: Emese Revfy <re.emese@gmail.com> CC: Brad Spengler <spender@grsecurity.net> CC: Stable <stable@vger.kernel.org>
2015-10-01dax: fix NULL pointer in __dax_pmd_fault()Ross Zwisler
Commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX") moved some code in __dax_pmd_fault() that was responsible for zeroing newly allocated PMD pages. The new location didn't properly set up 'kaddr', so when run this code resulted in a NULL pointer BUG. Fix this by getting the correct 'kaddr' via bdev_direct_access(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01Btrfs: move kobj stuff out of dev_replace lock rangeLiu Bo
To avoid deadlock described in commit 084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest."), we should move kobj stuff out of dev_replace lock range. "It is because the btrfs_kobj_{add/rm}_device() will call memory allocation with GFP_KERNEL, which may flush fs page cache to free space, waiting for it self to do the commit, causing the deadlock. To solve the problem, move btrfs_kobj_{add/rm}_device() out of the dev_replace lock range, also involing split the btrfs_rm_dev_replace_srcdev() function into remove and free parts. Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are called out of the lock range." Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> [added lockup description] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: add helper for closing one deviceAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> [reworded subject and changelog] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: don't log error from btrfs_get_bdev_and_sbAnand Jain
Originally the message was not in a helper but ended up there. We should print error messages from callers instead. Signed-off-by: Anand Jain <anand.jain@oracle.com> [reworded subject and changelog] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: kernel operation should come after user input has been verifiedAnand Jain
By general rule of thumb there shouldn't be any way that user land could trigger a kernel operation just by sending wrong arguments. Here do commit cleanups after user input has been verified. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: enhance btrfs_scratch_superblock to scratch all superblocksAnand Jain
This patch updates and renames btrfs_scratch_superblocks, (which is used by the replace device thread), with those fixes from the scratch superblock code section of btrfs_rm_device(). The fixes are: Scratch all copies of superblock Notify kobject that superblock has been changed Update time on the device So that btrfs_rm_device() can use the function btrfs_scratch_superblocks() instead of its own scratch code. And further replace deivce code which similarly releases device back to the system, will have the fixes from the btrfs device delete. Signed-off-by: Anand Jain <anand.jain@oracle.com> [renamed to btrfs_scratch_superblock] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: add btrfs_read_dev_one_super() to read one specific SBAnand Jain
This uses a chunk of code from btrfs_read_dev_super() and creates a function called btrfs_read_dev_one_super() so that next patch can use it for scratch superblock. Signed-off-by: Anand Jain <anand.jain@oracle.com> [renamed bufhead to bh] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: use BTRFS_ERROR_DEV_MISSING_NOT_FOUND when missing device is not foundAnand Jain
Use btrfs specific error code BTRFS_ERROR_DEV_MISSING_NOT_FOUND instead of -ENOENT. Next this removes the logging when user specifies "missing" and we don't find it in the kernel device list. Logging are for system events not for user input errors. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Merge tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS fixes from Richard Weinberger: "This contains three bug fixes for both UBI and UBIFS" * tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifs: UBI: return ENOSPC if no enough space available UBI: Validate data_size UBIFS: Kill unneeded locking in ubifs_init_security
2015-09-29Btrfs: consolidate btrfs_error() to btrfs_std_error()Anand Jain
btrfs_error() and btrfs_std_error() does the same thing and calls _btrfs_std_error(), so consolidate them together. And the main motivation is that btrfs_error() is closely named with btrfs_err(), one handles error action the other is to log the error, so don't closely name them. Signed-off-by: Anand Jain <anand.jain@oracle.com> Suggested-by: David Sterba <dsterba@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK ↵Anand Jain
defined error handling logic behaves differently with or without CONFIG_PRINTK defined, since there are two copies of the same function which a bit of different logic One, when CONFIG_PRINTK is defined, code is __btrfs_std_error(..) { :: save_error_info(fs_info); if (sb->s_flags & MS_BORN) btrfs_handle_error(fs_info); } and two when CONFIG_PRINTK is not defined, the code is __btrfs_std_error(..) { :: if (sb->s_flags & MS_BORN) { save_error_info(fs_info); btrfs_handle_error(fs_info); } } I doubt if this was intentional ? and appear to have caused since we maintain two copies of the same function and they got diverged with commits. Now to decide which logic is correct reviewed changes as below, 533574c6bc30cf526cc1c41bde050c854a945efb Commit added two copies of this function cf79ffb5b79e8a2b587fbf218809e691bb396c98 Commit made change to only one copy of the function and to the copy when CONFIG_PRINTK is defined. To fix this, instead of maintaining two copies of same function approach, maintain single function, and just put the extra portion of the code under CONFIG_PRINTK define. This patch just does that. And keeps code of with CONFIG_PRINTK defined. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: SB read failure should return EIO for __bread failureAnand Jain
This will return EIO when __bread() fails to read SB, instead of EINVAL. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename super_kobj to fsid_kobjAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_linkAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_linkAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_sysfs_remove_one to btrfs_sysfs_remove_mountedAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_sysfs_add_one to btrfs_sysfs_add_mountedAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29UBIFS: Kill unneeded locking in ubifs_init_securityRichard Weinberger
Fixes the following lockdep splat: [ 1.244527] ============================================= [ 1.245193] [ INFO: possible recursive locking detected ] [ 1.245193] 4.2.0-rc1+ #37 Not tainted [ 1.245193] --------------------------------------------- [ 1.245193] cp/742 is trying to acquire lock: [ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0 [ 1.245193] [ 1.245193] but task is already holding lock: [ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280 [ 1.245193] [ 1.245193] other info that might help us debug this: [ 1.245193] Possible unsafe locking scenario: [ 1.245193] [ 1.245193] CPU0 [ 1.245193] ---- [ 1.245193] lock(&sb->s_type->i_mutex_key#9); [ 1.245193] lock(&sb->s_type->i_mutex_key#9); [ 1.245193] [ 1.245193] *** DEADLOCK *** [ 1.245193] [ 1.245193] May be due to missing lock nesting notation [ 1.245193] [ 1.245193] 2 locks held by cp/742: [ 1.245193] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50 [ 1.245193] #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280 [ 1.245193] [ 1.245193] stack backtrace: [ 1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37 [ 1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014 [ 1.245193] ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5 [ 1.245193] ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68 [ 1.245193] 000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500 [ 1.245193] Call Trace: [ 1.245193] [<ffffffff814f6f49>] dump_stack+0x4c/0x65 [ 1.245193] [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510 [ 1.245193] [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0 [ 1.245193] [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80 [ 1.245193] [<ffffffff810a1a93>] lock_acquire+0xd3/0x270 [ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0 [ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0 [ 1.245193] [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280 [ 1.245193] [<ffffffff81195d15>] vfs_create+0x95/0xc0 [ 1.245193] [<ffffffff8119929c>] path_openat+0x7cc/0x1280 [ 1.245193] [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0 [ 1.245193] [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0 [ 1.245193] [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90 [ 1.245193] [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0 [ 1.245193] [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180 [ 1.245193] [<ffffffff8119ac55>] do_filp_open+0x75/0xd0 [ 1.245193] [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40 [ 1.245193] [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180 [ 1.245193] [<ffffffff81189bd9>] do_sys_open+0x129/0x200 [ 1.245193] [<ffffffff81189cc9>] SyS_open+0x19/0x20 [ 1.245193] [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f While the lockdep splat is a false positive, becuase path_openat holds i_mutex of the parent directory and ubifs_init_security() tries to acquire i_mutex of a new inode, it reveals that taking i_mutex in ubifs_init_security() is in vain because it is only being called in the inode allocation path and therefore nobody else can see the inode yet. Cc: stable@vger.kernel.org # 3.20- Reported-and-tested-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reviewed-and-tested-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: dedekind1@gmail.com
2015-09-27Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Four fixes from testing at the recent SMB3 Plugfest including two important authentication ones (one fixes authentication problems to some popular servers when clock times differ more than two hours between systems, the other fixes Kerberos authentication for SMB3)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: fix encryption error checks on mount [SMB3] Fix sec=krb5 on smb3 mounts cifs: use server timestamp for ntlmv2 authentication disabling oplocks/leases via module parm enable_oplocks broken for SMB3
2015-09-26[SMB3] Missing null tcon checkSteve French
Pointed out by Dan Carpenter via smatch code analysis tool CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <steve.french@primarydata.com>
2015-09-25Merge branch 'for-linus-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is an assorted set I've been queuing up: Jeff Mahoney tracked down a tricky one where we ended up starting IO on the wrong mapping for special files in btrfs_evict_inode. A few people reported this one on the list. Filipe found (and provided a test for) a difficult bug in reading compressed extents, and Josef fixed up some quota record keeping with snapshot deletion. Chandan killed off an accounting bug during DIO that lead to WARN_ONs as we freed inodes" * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: keep dropped roots in cache until transaction commit Btrfs: Direct I/O: Fix space accounting btrfs: skip waiting on ordered range for special files Btrfs: fix read corruption of compressed and shared extents Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock Btrfs: don't initialize a space info as full to prevent ENOSPC
2015-09-25Merge tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - fix v4.2 SEEK on files over 2 gigs - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O. - Fix recovery of recalled read delegations Bugfixes: - Fix a case where NFSv4 fails to send CLOSE after a server reboot - Fix sunrpc to wait for connections to complete before retrying - Fix sunrpc races between transport connect/disconnect and shutdown - Fix an infinite loop when layoutget fail with BAD_STATEID - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set - Fix layoutreturn/close ordering issues" * tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS41: make close wait for layoutreturn NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget NFSv4: Recovery of recalled read delegations is broken NFS: Fix an infinite loop when layoutget fail with BAD_STATEID NFS: Do cleanup before resetting pageio read/write to mds SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose SUNRPC: Lock the transport layer on shutdown nfs/filelayout: Fix NULL reference caused by double freeing of fh_array SUNRPC: Ensure that we wait for connections to complete before retrying SUNRPC: drop null test before destroy functions nfs: fix v4.2 SEEK on files over 2 gigs SUNRPC: Fix races between socket connection and destroy code nfs: fix pg_test page count calculation Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
2015-09-24fix encryption error checks on mountSteve French
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-09-24[SMB3] Fix sec=krb5 on smb3 mountsSteve French
Kerberos, which is very important for security, was only enabled for CIFS not SMB2/SMB3 mounts (e.g. vers=3.0) Patch based on the information detailed in http://thread.gmane.org/gmane.linux.kernel.cifs/10081/focus=10307 to enable Kerberized SMB2/SMB3 a) SMB2_negotiate: enable/use decode_negTokenInit in SMB2_negotiate b) SMB2_sess_setup: handle Kerberos sectype and replicate Kerberos SMB1 processing done in sess_auth_kerberos Signed-off-by: Noel Power <noel.power@suse.com> Signed-off-by: Jim McDonough <jmcd@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <steve.french@primarydata.com>
2015-09-23NFS41: make close wait for layoutreturnPeng Tao
If we send a layoutreturn asynchronously before close, the close might reach server first and layoutreturn would fail with BADSTATEID because there is nothing keeping the layout stateid alive. Also do not pretend sending layoutreturn if we are not. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-22ocfs2/dlm: fix deadlock when dispatch assert masterJoseph Qi
The order of the following three spinlocks should be: dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock But dlm_dispatch_assert_master() is called while holding dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls dlm_grab() which will take dlm_domain_lock. Once another thread (for example, dlm_query_join_handler) has already taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock happens. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: "Junxiao Bi" <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to ↵Andrea Arcangeli
__wake_up_locked_key" This reverts commit 51360155eccb907ff8635bd10fc7de876408c2e0 and adapts fs/userfaultfd.c to use the old version of that function. It didn't look robust to call __wake_up_common with "nr == 1" when we absolutely require wakeall semantics, but we've full control of what we insert in the two waitqueue heads of the blocked userfaults. No exclusive waitqueue risks to be inserted into those two waitqueue heads so we can as well stick to "nr == 1" of the old code and we can rely purely on the fact no waitqueue inserted in one of the two waitqueue heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>