summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-07-03Btrfs: fix use-after-free when cloning a trailing file holeFilipe Manana
The transaction handle was being used after being freed. Cc: Chris Mason <clm@fb.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix null pointer dereference in btrfs_show_devname when name is nullAnand Jain
dev->name is null but missing flag is not set. Strictly speaking the missing flag should have been set, but there are more places where code just checks if name is null. For now this patch does the same. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000064 IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs] [<ffffffff81198879>] show_vfsmnt+0x39/0x130 [<ffffffff81178056>] m_show+0x16/0x20 [<ffffffff8117d706>] seq_read+0x296/0x390 [<ffffffff8115aa7d>] vfs_read+0x9d/0x160 [<ffffffff8115b549>] SyS_read+0x49/0x90 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix null pointer dereference in clone_fs_devices when name is nullAnand Jain
when one of the device path is missing btrfs_device name is null. So this patch will check for that. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812e18c0>] strlen+0x0/0x30 [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs] [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs] [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0 [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs] [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0 [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180 [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590 [<ffffffff81193426>] ? blkdev_put+0x106/0x110 [<ffffffff81179175>] ? mntput+0x35/0x40 [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0 [<ffffffff8115c72e>] ? ____fput+0xe/0x10 [<ffffffff81068033>] ? task_work_run+0xb3/0xd0 [<ffffffff8116d547>] SyS_ioctl+0x57/0x90 [<ffffffff817a793e>] ? do_page_fault+0xe/0x10 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: fix nossd and ssd_spread mount option regressionEric Sandeen
The commit 0780253 btrfs: Cleanup the btrfs_parse_options for remount. broke ssd options quite badly; it stopped making ssd_spread imply ssd, and it made "nossd" unsettable. Put things back at least as well as they were before (though ssd mount option handling is still pretty odd: # mount -o "nossd,ssd_spread" works?) Reported-by: Roman Mamedov <rm@romanrm.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix race between balance recovery and root deletionWang Shilong
Balance recovery is called when RW mounting or remounting from RO to RW, it is called to finish roots merging. When doing balance recovery, relocation root's corresponding fs root(whose root refs is 0) might be destroyed by cleaner thread, this will make btrfs fail to mount. Fix this problem by holding @cleaner_mutex when doing balance recovery. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: atomically set inode->i_flags in btrfs_update_iflagsFilipe Manana
This change is based on the corresponding recent change for ext4: ext4: atomically set inode->i_flags in ext4_set_inode_flags() That has the following commit message that applies to btrfs as well: "Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time." Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE and BTRFS_INODE_APPEND, respectively. Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-02nfs: fix nfs4d readlink truncated packetAvi Kivity
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding, but truncates the packet to the padding-less size. Fix by taking the padding into consideration when truncating the packet. Symptoms: # ll /mnt/ ls: cannot read symbolic link /mnt/test: Input/output error total 4 -rw-r--r--. 1 root root 0 Jun 14 01:21 123456 lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree Signed-off-by: Avi Kivity <avi@cloudius-systems.com> Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation) Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-02kernfs: kernfs_notify() must be useable from non-sleepable contextsTejun Heo
d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events too") added fsnotify triggering to kernfs_notify() which requires a sleepable context. There are already existing users of kernfs_notify() which invoke it from an atomic context and in general it's silly to require a sleepable context for triggering a notification. The following is an invalid context bug triggerd by md invoking sysfs_notify() from IO completion path. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 2 locks held by swapper/1/0: #0: (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk] #1: (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240 irq event stamp: 33518 hardirqs last enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72 softirqs last enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c 0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000 0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2 Call Trace: <IRQ> [<ffffffff81807b4c>] dump_stack+0x4d/0x66 [<ffffffff810d4f14>] __might_sleep+0x184/0x240 [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440 [<ffffffff812d76a0>] kernfs_notify+0x90/0x150 [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240 [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1] [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1] [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1] [<ffffffff813acb8b>] bio_endio+0x6b/0xa0 [<ffffffff813b46c4>] blk_update_request+0x94/0x420 [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70 [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk] [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120 [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30 [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk] [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring] [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340 [<ffffffff8111647d>] handle_irq_event+0x3d/0x60 [<ffffffff81119436>] handle_edge_irq+0x66/0x130 [<ffffffff8101c3e4>] handle_irq+0x84/0x150 [<ffffffff818146ad>] do_IRQ+0x4d/0xe0 [<ffffffff818122f2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10 [<ffffffff81025454>] default_idle+0x24/0x230 [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20 [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0 [<ffffffff8104df1b>] start_secondary+0x25b/0x300 This patch fixes it by punting the notification delivery through a work item. This ends up adding an extra pointer to kernfs_elem_attr enlarging kernfs_node by a pointer, which is not ideal but not a very big deal either. If this turns out to be an actual issue, we can move kernfs_elem_attr->size to kernfs_node->iattr later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30kernfs: introduce kernfs_pin_sb()Li Zefan
kernfs_pin_sb() tries to get a refcnt of the superblock. This will be used by cgroupfs. v2: - make kernfs_pin_sb() return the superblock. - drop kernfs_drop_sb(). tj: Updated the comment a bit. [ This is a prerequisite for a bugfix. ] Cc: <stable@vger.kernel.org> # 3.15 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Fix a regression when trying to compile ext4 on older versions gcc. Fix a number of miscellaneous bugs for punch hole as well as a long-standing potential double buffer head release when failing a block allocation for an indirect-mapped file" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix hole punching for files with indirect blocks ext4: Fix block zeroing when punching holes in indirect block files ext4: decrement free clusters/inodes counters when block group declared bad fs/mbcache: replace __builtin_log2() with ilog2() ext4: Fix buffer double free in ext4_alloc_branch()
2014-06-28btrfs: only unlock block in verify_parent_transid if we locked itJosef Bacik
This is a regression from my patch a26e8c9f75b0bfd8cccc9e8f110737b136eb5994, we need to only unlock the block if we were the one who locked it. Otherwise this will trip BUG_ON()'s in locking.c Thanks, cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28Btrfs: assert send doesn't attempt to start transactionsFilipe Manana
When starting a transaction just assert that current->journal_info doesn't contain a send transaction stub, since send isn't supposed to start transactions and when it finishes (either successfully or not) it's supposed to set current->journal_info to NULL. This is motivated by the change titled: Btrfs: fix crash when starting transaction Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs compression: reuse recently used workspaceSergey Senozhatsky
Add compression `workspace' in free_workspace() to `idle_workspace' list head, instead of tail. So we have better chances to reuse most recently used `workspace'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28Btrfs: fix crash when mounting raid5 btrfs with missing disksLiu Bo
The reproducer is $ mkfs.btrfs D1 D2 D3 -mraid5 $ mkfs.ext4 D2 && mkfs.ext4 D3 $ mount D1 /btrfs -odegraded ------------------- [ 87.672992] ------------[ cut here ]------------ [ 87.673845] kernel BUG at fs/btrfs/raid56.c:1828! ... [ 87.673845] RIP: 0010:[<ffffffff813efc7e>] [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0 ... [ 87.673845] Call Trace: [ 87.673845] [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0 [ 87.673845] [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0 [ 87.673845] [<ffffffff81447c5b>] bio_endio+0x5b/0xa0 [ 87.673845] [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20 [ 87.673845] [<ffffffff81374621>] end_workqueue_fn+0x41/0x50 [ 87.673845] [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0 [ 87.673845] [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530 [ 87.673845] [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530 [ 87.673845] [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0 [ 87.673845] [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290 [ 87.673845] [<ffffffff810939c4>] kthread+0xe4/0x100 [ 87.673845] [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220 [ 87.673845] [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0 [ 87.673845] [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220 ------------------- It's because that we miscalculate @rbio->bbio->error so that it doesn't reach maximum of tolerable errors while it should have. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Satoru Takeuchi<takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs: create sprout should rename fsid on the sysfs as wellAnand Jain
Creating sprout will change the fsid of the mounted root. do the same on the sysfs as well. reproducer: mount /dev/sdb /btrfs (seed disk) btrfs dev add /dev/sdc /btrfs mount -o rw,remount /btrfs btrfs dev del /dev/sdb /btrfs mount /dev/sdb /btrfs Error: kobject_add_internal failed for fe350492-dc28-4051-a601-e017b17e6145 with -EEXIST, don't try to register things with the same name in the same directory. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs: dev replace should replace the sysfs entryAnand Jain
when we replace the device its corresponding sysfs entry has to be replaced as well Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs: dev add should add its sysfs entryAnand Jain
we would need the device links to be created, when device is added. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs: dev delete should remove sysfs entryAnand Jain
when we delete the device from the mounted btrfs, we would need its corresponding sysfs enty to be removed as well. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-28btrfs: rename add_device_membership to btrfs_kobj_add_deviceAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-27nfsd: fix rare symlink decoding bugJ. Bruce Fields
An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-26ext4: Fix hole punching for files with indirect blocksJan Kara
Hole punching code for files with indirect blocks wrongly computed number of blocks which need to be cleared when traversing the indirect block tree. That could result in punching more blocks than actually requested and thus effectively cause a data loss. For example: fallocate -n -p 10240000 4096 will punch the range 10240000 - 12632064 instead of the range 1024000 - 10244096. Fix the calculation. CC: stable@vger.kernel.org Fixes: 8bad6fc813a3a5300f51369c39d315679fd88c72 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-06-26ext4: Fix block zeroing when punching holes in indirect block filesJan Kara
free_holes_block() passed local variable as a block pointer to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local variable instead of proper place in inode / indirect block. We later zero out proper place in inode / indirect block but don't dirty the inode / buffer again which can lead to subtle issues (some changes e.g. to inode can be lost). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-06-26ext4: decrement free clusters/inodes counters when block group declared badNamjae Jeon
We should decrement free clusters counter when block bitmap is marked as corrupt and free inodes counter when the allocation bitmap is marked as corrupt to avoid misunderstanding due to incorrect available size in statfs result. User can get immediately ENOSPC error from write begin without reaching for the writepages. Cc: Darrick J. Wong<darrick.wong@oracle.com> Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
2014-06-25Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Small set of misc cifs/smb3 fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] fix mount failure with broken pathnames when smb3 mount with mapchars option cifs: revalidate mapping prior to satisfying read_iter request with cache=loose fs/cifs: fix regression in cifs_create_mf_symlink()
2014-06-25Merge tag 'nfs-for-3.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Highlights include: - Stable fix for a data corruption case due to incorrect cache validation - Fix a couple of false positive cache invalidations - Fix NFSv4 security negotiation issues" * tag 'nfs-for-3.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support NFS Return -EPERM if no supported or matching SECINFO flavor NFS check the return of nfs4_negotiate_security in nfs4_submount NFS: Don't mark the data cache as invalid if it has been flushed NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file size nfs: Fix cache_validity check in nfs_write_pageuptodate()
2014-06-25fs/mbcache: replace __builtin_log2() with ilog2()T Makphaibulchoke
Fix compiler error with some gcc version(s) that do not support __builtin_log2() by replacing __builtin_log2() with ilog2(). Signed-off-by: T. Makphaibulchoke <tmac@hp.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>
2014-06-25FS/NFS: replace count*size kzalloc by kcallocFabian Frederick
kcalloc manages count*sizeof overflow. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-25nfs: get rid of duplicate dprintkWeston Andros Adamson
This was introduced by a merge error with my recent pgio patchset. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: Fix unused variable errorAnna Schumaker
inode is unused when CONFIG_SUNRPC_DEBUG=n. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: remove unneeded EXPORTsWeston Andros Adamson
EXPORT_GPLs of nfs_pageio_add_request and nfs_pageio_complete aren't needed anymore. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24pnfs: clean up *_resend_to_mdsWeston Andros Adamson
Clean up pnfs_read_done_resend_to_mds and pnfs_write_done_resend_to_mds: - instead of passing all arguments from a nfs_pgio_header, just pass the header - share the common code Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: remove pgio_header refcount, related cleanupWeston Andros Adamson
The refcounting on nfs_pgio_header was related to there being (possibly) more than one nfs_pgio_data. Now that nfs_pgio_data has been merged into nfs_pgio_header, there is no reason to do this ref counting. Just call the completion callback on nfs_pgio_release/nfs_pgio_error. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: remove unused writeverf codeWeston Andros Adamson
Remove duplicate writeverf structure from merge of nfs_pgio_header and nfs_pgio_data and remove writeverf related flags and logic to handle more than one RPC per nfs_pgio_header. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: merge nfs_pgio_data into _headerWeston Andros Adamson
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: rename members of nfs_pgio_dataWeston Andros Adamson
Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for merge of nfs_pgio_data and nfs_pgio_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: move nfs_pgio_data and remove nfs_rw_headerWeston Andros Adamson
nfs_rw_header was used to allocate an nfs_pgio_header along with an nfs_pgio_data, because a _header would need at least one _data. Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move it to nfs_pgio_header and get rid of nfs_rw_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for supportAndy Adamson
Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO returned pseudoflavor. Check credential creation (and gss_context creation) which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple reasons including mis-configuration. Don't call nfs4_negotiate in nfs4_submount as it was just called by nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix corrupt return value from nfs_find_best_sec()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS Return -EPERM if no supported or matching SECINFO flavorAndy Adamson
Do not return RPC_AUTH_UNIX if SEINFO reply tests fail. This prevents an infinite loop of NFS4ERR_WRONGSEC for non RPC_AUTH_UNIX mounts. Without this patch, a mount with no sec= option to a server that does not include RPC_AUTH_UNIX in the SECINFO return can be presented with an attemtp to use RPC_AUTH_UNIX which will result in an NFS4ERR_WRONG_SEC which will prompt the SECINFO call which will again try RPC_AUTH_UNIX.... Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS check the return of nfs4_negotiate_security in nfs4_submountAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS: Don't mark the data cache as invalid if it has been flushedTrond Myklebust
Now that we have functions such as nfs_write_pageuptodate() that use the cache_validity flags to check if the data cache is valid or not, it is a little more important to keep the flags in sync with the state of the data cache. In particular, we'd like to ensure that if the data cache is empty, we don't start marking it as needing revalidation. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file sizeTrond Myklebust
In nfs_update_inode(), if the change attribute is seen to change on the server, then we set NFS_INO_REVAL_PAGECACHE in order to make sure that we check the file size. However, if we also update the file size in the same function, we don't need to check it again. So make sure that we clear the NFS_INO_REVAL_PAGECACHE that was set earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: Fix cache_validity check in nfs_write_pageuptodate()Scott Mayhew
NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation. We're still having some problems with data corruption when multiple clients are appending to a file and those clients are being granted write delegations on open. To reproduce: Client A: vi /mnt/`hostname -s` while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done Client B: vi /mnt/`hostname -s` while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done What's happening is that in nfs_update_inode() we're recognizing that the file size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we ignore the cache_validity flags in nfs_write_pageuptodate() because we have a delegation. As a result, in nfs_updatepage() we're extending the write to cover the full page even though we've not read in the data to begin with. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24aio: fix kernel memory disclosure in io_getevents() introduced in v3.10Benjamin LaHaise
A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10 by commit a31ad380bed817aa25f8830ad23e1a0480fef797. The changes made to aio_read_events_ring() failed to correctly limit the index into ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of an arbitrary page with a copy_to_user() to copy the contents into userspace. This vulnerability has been assigned CVE-2014-0206. Thanks to Mateusz and Petr for disclosing this issue. This patch applies to v3.12+. A separate backport is needed for 3.10/3.11. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: stable@vger.kernel.org
2014-06-24aio: fix aio request leak when events are reaped by userspaceBenjamin LaHaise
The aio cleanups and optimizations by kmo that were merged into the 3.10 tree added a regression for userspace event reaping. Specifically, the reference counts are not decremented if the event is reaped in userspace, leading to the application being unable to submit further aio requests. This patch applies to 3.12+. A separate backport is required for 3.10/3.11. This issue was uncovered as part of CVE-2014-0206. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org Cc: Kent Overstreet <kmo@daterainc.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com>
2014-06-24[CIFS] fix mount failure with broken pathnames when smb3 mount with mapchars ↵Steve French
option When we SMB3 mounted with mapchars (to allow reserved characters : \ / > < * ? via the Unicode Windows to POSIX remap range) empty paths (eg when we open "" to query the root of the SMB3 directory on mount) were not null terminated so we sent garbarge as a path name on empty paths which caused SMB2/SMB2.1/SMB3 mounts to fail when mapchars was specified. mapchars is particularly important since Unix Extensions for SMB3 are not supported (yet) Signed-off-by: Steve French <smfrench@gmail.com> Cc: <stable@vger.kernel.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
2014-06-23ocfs2/dlm: do not purge lockres that is queued for assert masterXue jiufei
When workqueue is delayed, it may occur that a lockres is purged while it is still queued for master assert. it may trigger BUG() as follows. N1 N2 dlm_get_lockres() ->dlm_do_master_requery is the master of lockres, so queue assert_master work dlm_thread() start running and purge the lockres dlm_assert_master_worker() send assert master message to other nodes receiving the assert_master message, set master to N2 dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID, if it is RECOVERY lockres, it triggers the BUG(). Another BUG() is triggered when N3 become the new master and send assert_master to N1, N1 will trigger the BUG() because owner doesn't match. So we should not purge lockres when it is queued for assert master. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop ↵jiangyiwen
during umount The following case may lead to endless loop during umount. node A node B node C node D umount volume, migrate lockres1 to B want to lock lockres1, send MASTER_REQUEST_MSG to C init block mle send MIGRATE_REQUEST_MSG to C find a block mle, and then return DLM_MIGRATE_RESPONSE_MASTERY_REF to B set C in refmap umount successfully try to umount, endless loop occurs when migrate lockres1 since C is in refmap So we can fix this endless loop case by only returning DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving MIGRATE_REQUEST_MSG. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ↵jiangyiwen
ocfs2_mknod When the call to ocfs2_add_entry() failed in ocfs2_symlink() and ocfs2_mknod(), iput() will not be called during dput(dentry) because no d_instantiate(), and this will lead to umount hung. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: fix a tiny race when running dirop_fileop_racerYiwen Jiang
When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()Xue jiufei
When a lockres in purge list but is still in use, it should be moved to the tail of purge list. dlm_thread will continue to check next lockres in purge list. However, code list_move_tail(&dlm->purge_list, &lockres->purge) will do *no* movements, so dlm_thread will purge the same lockres in this loop again and again. If it is in use for a long time, other lockres will not be processed. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>