summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-10-01Btrfs: kernel operation should come after user input has been verifiedAnand Jain
By general rule of thumb there shouldn't be any way that user land could trigger a kernel operation just by sending wrong arguments. Here do commit cleanups after user input has been verified. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: enhance btrfs_scratch_superblock to scratch all superblocksAnand Jain
This patch updates and renames btrfs_scratch_superblocks, (which is used by the replace device thread), with those fixes from the scratch superblock code section of btrfs_rm_device(). The fixes are: Scratch all copies of superblock Notify kobject that superblock has been changed Update time on the device So that btrfs_rm_device() can use the function btrfs_scratch_superblocks() instead of its own scratch code. And further replace deivce code which similarly releases device back to the system, will have the fixes from the btrfs device delete. Signed-off-by: Anand Jain <anand.jain@oracle.com> [renamed to btrfs_scratch_superblock] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: add btrfs_read_dev_one_super() to read one specific SBAnand Jain
This uses a chunk of code from btrfs_read_dev_super() and creates a function called btrfs_read_dev_one_super() so that next patch can use it for scratch superblock. Signed-off-by: Anand Jain <anand.jain@oracle.com> [renamed bufhead to bh] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: use BTRFS_ERROR_DEV_MISSING_NOT_FOUND when missing device is not foundAnand Jain
Use btrfs specific error code BTRFS_ERROR_DEV_MISSING_NOT_FOUND instead of -ENOENT. Next this removes the logging when user specifies "missing" and we don't find it in the kernel device list. Logging are for system events not for user input errors. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Merge tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS fixes from Richard Weinberger: "This contains three bug fixes for both UBI and UBIFS" * tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifs: UBI: return ENOSPC if no enough space available UBI: Validate data_size UBIFS: Kill unneeded locking in ubifs_init_security
2015-10-01fs/proc, core/debug: Don't expose absolute kernel addresses via wchanIngo Molnar
So the /proc/PID/stat 'wchan' field (the 30th field, which contains the absolute kernel address of the kernel function a task is blocked in) leaks absolute kernel addresses to unprivileged user-space: seq_put_decimal_ull(m, ' ', wchan); The absolute address might also leak via /proc/PID/wchan as well, if KALLSYMS is turned off or if the symbol lookup fails for some reason: static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long wchan; char symname[KSYM_NAME_LEN]; wchan = get_wchan(task); if (lookup_symbol_name(wchan, symname) < 0) { if (!ptrace_may_access(task, PTRACE_MODE_READ)) return 0; seq_printf(m, "%lu", wchan); } else { seq_printf(m, "%s", symname); } return 0; } This isn't ideal, because for example it trivially leaks the KASLR offset to any local attacker: fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35) ffffffff8123b380 Most real-life uses of wchan are symbolic: ps -eo pid:10,tid:10,wchan:30,comm and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat: triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1 open("/proc/30833/wchan", O_RDONLY) = 6 There's one compatibility quirk here: procps relies on whether the absolute value is non-zero - and we can provide that functionality by outputing "0" or "1" depending on whether the task is blocked (whether there's a wchan address). These days there appears to be very little legitimate reason user-space would be interested in the absolute address. The absolute address is mostly historic: from the days when we didn't have kallsyms and user-space procps had to do the decoding itself via the System.map. So this patch sets all numeric output to "0" or "1" and keeps only symbolic output, in /proc/PID/wchan. ( The absolute sleep address can generally still be profiled via perf, by tasks with sufficient privileges. ) Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-29Btrfs: consolidate btrfs_error() to btrfs_std_error()Anand Jain
btrfs_error() and btrfs_std_error() does the same thing and calls _btrfs_std_error(), so consolidate them together. And the main motivation is that btrfs_error() is closely named with btrfs_err(), one handles error action the other is to log the error, so don't closely name them. Signed-off-by: Anand Jain <anand.jain@oracle.com> Suggested-by: David Sterba <dsterba@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK ↵Anand Jain
defined error handling logic behaves differently with or without CONFIG_PRINTK defined, since there are two copies of the same function which a bit of different logic One, when CONFIG_PRINTK is defined, code is __btrfs_std_error(..) { :: save_error_info(fs_info); if (sb->s_flags & MS_BORN) btrfs_handle_error(fs_info); } and two when CONFIG_PRINTK is not defined, the code is __btrfs_std_error(..) { :: if (sb->s_flags & MS_BORN) { save_error_info(fs_info); btrfs_handle_error(fs_info); } } I doubt if this was intentional ? and appear to have caused since we maintain two copies of the same function and they got diverged with commits. Now to decide which logic is correct reviewed changes as below, 533574c6bc30cf526cc1c41bde050c854a945efb Commit added two copies of this function cf79ffb5b79e8a2b587fbf218809e691bb396c98 Commit made change to only one copy of the function and to the copy when CONFIG_PRINTK is defined. To fix this, instead of maintaining two copies of same function approach, maintain single function, and just put the extra portion of the code under CONFIG_PRINTK define. This patch just does that. And keeps code of with CONFIG_PRINTK defined. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: SB read failure should return EIO for __bread failureAnand Jain
This will return EIO when __bread() fails to read SB, instead of EINVAL. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename super_kobj to fsid_kobjAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_linkAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_linkAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_sysfs_remove_one to btrfs_sysfs_remove_mountedAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_sysfs_add_one to btrfs_sysfs_add_mountedAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29debugfs: document that debugfs_remove*() accepts NULL and error valuesUlf Magnusson
According to commit a59d6293e537 ("debugfs: change parameter check in debugfs_remove() functions"), this is meant to make cleanup easier for callers. In that case it ought to be documented. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-29fs: Drop unlikely before IS_ERR(_OR_NULL)Viresh Kumar
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-29UBIFS: Kill unneeded locking in ubifs_init_securityRichard Weinberger
Fixes the following lockdep splat: [ 1.244527] ============================================= [ 1.245193] [ INFO: possible recursive locking detected ] [ 1.245193] 4.2.0-rc1+ #37 Not tainted [ 1.245193] --------------------------------------------- [ 1.245193] cp/742 is trying to acquire lock: [ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0 [ 1.245193] [ 1.245193] but task is already holding lock: [ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280 [ 1.245193] [ 1.245193] other info that might help us debug this: [ 1.245193] Possible unsafe locking scenario: [ 1.245193] [ 1.245193] CPU0 [ 1.245193] ---- [ 1.245193] lock(&sb->s_type->i_mutex_key#9); [ 1.245193] lock(&sb->s_type->i_mutex_key#9); [ 1.245193] [ 1.245193] *** DEADLOCK *** [ 1.245193] [ 1.245193] May be due to missing lock nesting notation [ 1.245193] [ 1.245193] 2 locks held by cp/742: [ 1.245193] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50 [ 1.245193] #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280 [ 1.245193] [ 1.245193] stack backtrace: [ 1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37 [ 1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014 [ 1.245193] ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5 [ 1.245193] ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68 [ 1.245193] 000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500 [ 1.245193] Call Trace: [ 1.245193] [<ffffffff814f6f49>] dump_stack+0x4c/0x65 [ 1.245193] [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510 [ 1.245193] [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0 [ 1.245193] [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80 [ 1.245193] [<ffffffff810a1a93>] lock_acquire+0xd3/0x270 [ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0 [ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0 [ 1.245193] [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0 [ 1.245193] [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280 [ 1.245193] [<ffffffff81195d15>] vfs_create+0x95/0xc0 [ 1.245193] [<ffffffff8119929c>] path_openat+0x7cc/0x1280 [ 1.245193] [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0 [ 1.245193] [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0 [ 1.245193] [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90 [ 1.245193] [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0 [ 1.245193] [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180 [ 1.245193] [<ffffffff8119ac55>] do_filp_open+0x75/0xd0 [ 1.245193] [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40 [ 1.245193] [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180 [ 1.245193] [<ffffffff81189bd9>] do_sys_open+0x129/0x200 [ 1.245193] [<ffffffff81189cc9>] SyS_open+0x19/0x20 [ 1.245193] [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f While the lockdep splat is a false positive, becuase path_openat holds i_mutex of the parent directory and ubifs_init_security() tries to acquire i_mutex of a new inode, it reveals that taking i_mutex in ubifs_init_security() is in vain because it is only being called in the inode allocation path and therefore nobody else can see the inode yet. Cc: stable@vger.kernel.org # 3.20- Reported-and-tested-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reviewed-and-tested-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: dedekind1@gmail.com
2015-09-28jffs2: remove unneeded kfreefangwei
c->oobbuf hasn't been kmalloced in jffs2_dataflash_setup, so there is no need to free it. Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2015-09-28jffs2: remove unnecessary new_valid_dev checkYaowei Bai
As new_valid_dev always returns 1, so !new_valid_dev check is not needed, remove it. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2015-09-27Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Four fixes from testing at the recent SMB3 Plugfest including two important authentication ones (one fixes authentication problems to some popular servers when clock times differ more than two hours between systems, the other fixes Kerberos authentication for SMB3)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: fix encryption error checks on mount [SMB3] Fix sec=krb5 on smb3 mounts cifs: use server timestamp for ntlmv2 authentication disabling oplocks/leases via module parm enable_oplocks broken for SMB3
2015-09-26[SMB3] Missing null tcon checkSteve French
Pointed out by Dan Carpenter via smatch code analysis tool CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <steve.french@primarydata.com>
2015-09-25Merge branch 'for-linus-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is an assorted set I've been queuing up: Jeff Mahoney tracked down a tricky one where we ended up starting IO on the wrong mapping for special files in btrfs_evict_inode. A few people reported this one on the list. Filipe found (and provided a test for) a difficult bug in reading compressed extents, and Josef fixed up some quota record keeping with snapshot deletion. Chandan killed off an accounting bug during DIO that lead to WARN_ONs as we freed inodes" * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: keep dropped roots in cache until transaction commit Btrfs: Direct I/O: Fix space accounting btrfs: skip waiting on ordered range for special files Btrfs: fix read corruption of compressed and shared extents Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock Btrfs: don't initialize a space info as full to prevent ENOSPC
2015-09-25Merge tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - fix v4.2 SEEK on files over 2 gigs - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O. - Fix recovery of recalled read delegations Bugfixes: - Fix a case where NFSv4 fails to send CLOSE after a server reboot - Fix sunrpc to wait for connections to complete before retrying - Fix sunrpc races between transport connect/disconnect and shutdown - Fix an infinite loop when layoutget fail with BAD_STATEID - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set - Fix layoutreturn/close ordering issues" * tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS41: make close wait for layoutreturn NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget NFSv4: Recovery of recalled read delegations is broken NFS: Fix an infinite loop when layoutget fail with BAD_STATEID NFS: Do cleanup before resetting pageio read/write to mds SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose SUNRPC: Lock the transport layer on shutdown nfs/filelayout: Fix NULL reference caused by double freeing of fh_array SUNRPC: Ensure that we wait for connections to complete before retrying SUNRPC: drop null test before destroy functions nfs: fix v4.2 SEEK on files over 2 gigs SUNRPC: Fix races between socket connection and destroy code nfs: fix pg_test page count calculation Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
2015-09-24ext4: Update EXT4_USE_FOR_EXT2 descriptionJean Delvare
Configuration option EXT4_USE_FOR_EXT2 has no effect on ext3 support. Support for ext3 is always included now. Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: c290ea01ab ("fs: Remove ext3 filesystem driver") Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Jan Kara <jack@suse.com>
2015-09-24fix encryption error checks on mountSteve French
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-09-24[SMB3] Fix sec=krb5 on smb3 mountsSteve French
Kerberos, which is very important for security, was only enabled for CIFS not SMB2/SMB3 mounts (e.g. vers=3.0) Patch based on the information detailed in http://thread.gmane.org/gmane.linux.kernel.cifs/10081/focus=10307 to enable Kerberized SMB2/SMB3 a) SMB2_negotiate: enable/use decode_negTokenInit in SMB2_negotiate b) SMB2_sess_setup: handle Kerberos sectype and replicate Kerberos SMB1 processing done in sess_auth_kerberos Signed-off-by: Noel Power <noel.power@suse.com> Signed-off-by: Jim McDonough <jmcd@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <steve.french@primarydata.com>
2015-09-23fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct readMing Lei
When direct read IO is submitted from kernel, it is often unnecessary to dirty pages, for example of loop, dirtying pages have been considered in the upper filesystem(over loop) side already, and they don't need to be dirtied again. So this patch doesn't dirtying pages for ITER_BVEC/ITER_KVEC direct read, and loop should be the 1st case to use ITER_BVEC/ITER_KVEC for direct read I/O. The patch is based on previous Dave's patch. Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-09-23fs/mpage.c: forgotten WRITE_SYNC in case of data integrity writeRoman Pen
In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity write, thus mark request as WRITE_SYNC. akpm: afaict this change will cause the data integrity write bios to be placed onto the second queue in cfq_io_cq.cfqq[], which presumably results in special treatment. The documentation for REQ_SYNC is horrid. Signed-off-by: Roman Pen <r.peniaev@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-09-23ext4: move procfs registration code to fs/ext4/sysfs.cTheodore Ts'o
This allows us to refactor the procfs code, which saves a bit of compiled space. More importantly it isolates most of the procfs support code into a single file, so it's easier to #ifdef it out if the proc file system has been disabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-09-23ext4: refactor sysfs support codeTheodore Ts'o
Make the code more easily extensible as well as taking up less compiled space. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-09-23ext4: move sysfs code from super.c to fs/ext4/sysfs.cTheodore Ts'o
Also statically allocate the ext4_kset and ext4_feat objects, since we only need exactly one of each, and it's simpler and less code if we drop the dynamic allocation and deallocation when it's not needed. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-09-23GFS2: Set s_mode before parsing mount optionsAndrew Price
In the generic mount_bdev() function, deactivate_locked_super() is called after the fill_super() call fails, at which point s_mode has been set. kill_block_super() expects this and dumps a warning when FMODE_EXCL is not set in s_mode. In gfs2_mount() we call deactivate_locked_super() on failure of gfs2_mount_args(), at which point s_mode has not yet been set. This causes kill_block_super() to dump a stack trace when gfs2 fails to mount with invalid options. Set s_mode earlier in gfs2_mount() to avoid that. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-09-23NFS41: make close wait for layoutreturnPeng Tao
If we send a layoutreturn asynchronously before close, the close might reach server first and layoutreturn would fail with BADSTATEID because there is nothing keeping the layout stateid alive. Also do not pretend sending layoutreturn if we are not. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-22ocfs2/dlm: fix deadlock when dispatch assert masterJoseph Qi
The order of the following three spinlocks should be: dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock But dlm_dispatch_assert_master() is called while holding dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls dlm_grab() which will take dlm_domain_lock. Once another thread (for example, dlm_query_join_handler) has already taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock happens. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: "Junxiao Bi" <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to ↵Andrea Arcangeli
__wake_up_locked_key" This reverts commit 51360155eccb907ff8635bd10fc7de876408c2e0 and adapts fs/userfaultfd.c to use the old version of that function. It didn't look robust to call __wake_up_common with "nr == 1" when we absolutely require wakeall semantics, but we've full control of what we insert in the two waitqueue heads of the blocked userfaults. No exclusive waitqueue risks to be inserted into those two waitqueue heads so we can as well stick to "nr == 1" of the old code and we can rely purely on the fact no waitqueue inserted in one of the two waitqueue heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is setKinglong Mee
When lseg's commit_through_mds is set, pnfs client always WARN once in nfs_direct_select_verf after checking ds_cinfo.nbuckets. nfs should use the DS verf except commit_through_mds is set for layout segment where nbuckets is zero. [17844.666094] ------------[ cut here ]------------ [17844.667071] WARNING: CPU: 0 PID: 21758 at /root/source/linux-pnfs/fs/nfs/direct.c:174 nfs_direct_select_verf+0x5a/0x70 [nfs]() [17844.668650] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul auth_rpcgss crc32_pclmul crc32c_intel nfs_acl ghash_clmulni_intel lockd vmw_balloon xor vmw_vmci grace raid6_pq shpchp sunrpc parport_pc i2c_piix4 parport vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache] [17844.686676] CPU: 0 PID: 21758 Comm: kworker/0:1 Tainted: G W OE 4.3.0-rc1-pnfs+ #245 [17844.687352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [17844.698502] Workqueue: nfsiod rpc_async_release [sunrpc] [17844.699212] 0000000000000009 0000000043e58010 ffff8800454fbc10 ffffffff813680c4 [17844.699990] ffff8800454fbc48 ffffffff8108b49d ffff88004eb20000 ffff88004eb20000 [17844.700844] ffff880062e26000 0000000000000000 0000000000000001 ffff8800454fbc58 [17844.701637] Call Trace: [17844.725252] [<ffffffff813680c4>] dump_stack+0x19/0x25 [17844.732693] [<ffffffff8108b49d>] warn_slowpath_common+0x7d/0xb0 [17844.733855] [<ffffffff8108b5da>] warn_slowpath_null+0x1a/0x20 [17844.735015] [<ffffffffa04a27ca>] nfs_direct_select_verf+0x5a/0x70 [nfs] [17844.735999] [<ffffffffa04a2b83>] nfs_direct_set_hdr_verf+0x23/0x90 [nfs] [17844.736846] [<ffffffffa04a2e17>] nfs_direct_write_completion+0x227/0x260 [nfs] [17844.737782] [<ffffffffa04a433c>] nfs_pgio_release+0x1c/0x20 [nfs] [17844.738597] [<ffffffffa0502df3>] pnfs_generic_rw_release+0x23/0x30 [nfsv4] [17844.739486] [<ffffffffa01cbbea>] rpc_free_task+0x2a/0x70 [sunrpc] [17844.740326] [<ffffffffa01cbcd5>] rpc_async_release+0x15/0x20 [sunrpc] [17844.741173] [<ffffffff810a387c>] process_one_work+0x21c/0x4c0 [17844.741984] [<ffffffff810a37cd>] ? process_one_work+0x16d/0x4c0 [17844.742837] [<ffffffff810a3b6a>] worker_thread+0x4a/0x440 [17844.743639] [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0 [17844.744399] [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0 [17844.745176] [<ffffffff810a8d75>] kthread+0xf5/0x110 [17844.745927] [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240 [17844.747105] [<ffffffff8172ce1f>] ret_from_fork+0x3f/0x70 [17844.747856] [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240 [17844.748642] ---[ end trace 336a2845d42b83f0 ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-22cifs: use server timestamp for ntlmv2 authenticationPeter Seiderer
Linux cifs mount with ntlmssp against an Mac OS X (Yosemite 10.10.5) share fails in case the clocks differ more than +/-2h: digest-service: digest-request: od failed with 2 proto=ntlmv2 digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2 Fix this by (re-)using the given server timestamp for the ntlmv2 authentication (as Windows 7 does). A related problem was also reported earlier by Namjae Jaen (see below): Windows machine has extended security feature which refuse to allow authentication when there is time difference between server time and client time when ntlmv2 negotiation is used. This problem is prevalent in embedded enviornment where system time is set to default 1970. Modern servers send the server timestamp in the TargetInfo Av_Pair structure in the challenge message [see MS-NLMP 2.2.2.1] In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must use the server provided timestamp if present OR current time if it is not Reported-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Peter Seiderer <ps.report@gmx.net> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2015-09-22disabling oplocks/leases via module parm enable_oplocks broken for SMB3Steve French
leases (oplocks) were always requested for SMB2/SMB3 even when oplocks disabled in the cifs.ko module. Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Chandrika Srinivasan <chandrika.srinivasan@citrix.com> CC: Stable <stable@vger.kernel.org>
2015-09-22Btrfs: keep dropped roots in cache until transaction commitJosef Bacik
When dropping a snapshot we need to account for the qgroup changes. If we drop the snapshot in all one go then the backref code will fail to find blocks from the snapshot we dropped since it won't be able to find the root in the fs root cache. This can lead to us failing to find refs from other roots that pointed at blocks in the now deleted root. To handle this we need to not remove the fs roots from the cache until after we process the qgroup operations. Do this by adding dropped roots to a list on the transaction, and letting the transaction remove the roots at the same time it drops the commit roots. This will keep all of the backref searching code in sync properly, and fixes a problem Mark was seeing with snapshot delete and qgroups. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-09-22GFS2: fallocate: do not rely on file_update_time to mark the inode dirtyAndrew Price
Previously __gfs2_fallocate() relied on file_update_time() marking the inode dirty, but that's not a safe assumption as that function doesn't dirty the inode in some cases. Mark the inode dirty explicitly. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-09-21jffs2: drop null test before destroy functionsJulia Lawall
Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2015-09-21Btrfs: Direct I/O: Fix space accountingchandan
The following call trace is seen when generic/095 test is executed, WARNING: CPU: 3 PID: 2769 at /home/chandan/code/repos/linux/fs/btrfs/inode.c:8967 btrfs_destroy_inode+0x284/0x2a0() Modules linked in: CPU: 3 PID: 2769 Comm: umount Not tainted 4.2.0-rc5+ #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20150306_163512-brownie 04/01/2014 ffffffff81c08150 ffff8802ec9cbce8 ffffffff81984058 ffff8802ffd8feb0 0000000000000000 ffff8802ec9cbd28 ffffffff81050385 ffff8802ec9cbd38 ffff8802d12f8588 ffff8802d12f8588 ffff8802f15ab000 ffff8800bb96c0b0 Call Trace: [<ffffffff81984058>] dump_stack+0x45/0x57 [<ffffffff81050385>] warn_slowpath_common+0x85/0xc0 [<ffffffff81050465>] warn_slowpath_null+0x15/0x20 [<ffffffff81340294>] btrfs_destroy_inode+0x284/0x2a0 [<ffffffff8117ce07>] destroy_inode+0x37/0x60 [<ffffffff8117cf39>] evict+0x109/0x170 [<ffffffff8117cfd5>] dispose_list+0x35/0x50 [<ffffffff8117dd3a>] evict_inodes+0xaa/0x100 [<ffffffff81165667>] generic_shutdown_super+0x47/0xf0 [<ffffffff81165951>] kill_anon_super+0x11/0x20 [<ffffffff81302093>] btrfs_kill_super+0x13/0x110 [<ffffffff81165c99>] deactivate_locked_super+0x39/0x70 [<ffffffff811660cf>] deactivate_super+0x5f/0x70 [<ffffffff81180e1e>] cleanup_mnt+0x3e/0x90 [<ffffffff81180ebd>] __cleanup_mnt+0xd/0x10 [<ffffffff81069c06>] task_work_run+0x96/0xb0 [<ffffffff81003a3d>] do_notify_resume+0x3d/0x50 [<ffffffff8198cbc2>] int_signal+0x12/0x17 This means that the inode had non-zero "outstanding extents" during eviction. This occurs because, during direct I/O a task which successfully used up its reserved data space would set BTRFS_INODE_DIO_READY bit and does not clear the bit after finishing the DIO write. A future DIO write could actually fail and the unused reserve space won't be freed because of the previously set BTRFS_INODE_DIO_READY bit. Clearing the BTRFS_INODE_DIO_READY bit in btrfs_direct_IO() caused the following issue, |-----------------------------------+-------------------------------------| | Task A | Task B | |-----------------------------------+-------------------------------------| | Start direct i/o write on inode X.| | | reserve space | | | Allocate ordered extent | | | release reserved space | | | Set BTRFS_INODE_DIO_READY bit. | | | | splice() | | | Transfer data from pipe buffer to | | | destination file. | | | - kmap(pipe buffer page) | | | - Start direct i/o write on | | | inode X. | | | - reserve space | | | - dio_refill_pages() | | | - sdio->blocks_available == 0 | | | - Since a kernel address is | | | being passed instead of a | | | user space address, | | | iov_iter_get_pages() returns | | | -EFAULT. | | | - Since BTRFS_INODE_DIO_READY is | | | set, we don't release reserved | | | space. | | | - Clear BTRFS_INODE_DIO_READY bit.| | -EIOCBQUEUED is returned. | | |-----------------------------------+-------------------------------------| Hence this commit introduces "struct btrfs_dio_data" to track the usage of reserved data space. The remaining unused "reserve space" can now be freed reliably. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-09-21fs: fix data races on inode->i_flctxDmitry Vyukov
locks_get_lock_context() uses cmpxchg() to install i_flctx. cmpxchg() is a release operation which is correct. But it uses a plain load to load i_flctx. This is incorrect. Subsequent loads from i_flctx can hoist above the load of i_flctx pointer itself and observe uninitialized garbage there. This in turn can lead to corruption of ctx->flc_lock and other members. Documentation/memory-barriers.txt explicitly requires to use a barrier in such context: "A load-load control dependency requires a full read memory barrier". Use smp_load_acquire() in locks_get_lock_context() and in bunch of other functions that can proceed concurrently with locks_get_lock_context(). The data race was found with KernelThreadSanitizer (KTSAN). Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-09-20NFSv4.x/pnfs: Don't try to recover stateids twice in layoutgetTrond Myklebust
If the current open or layout stateid doesn't match the stateid used in the layoutget RPC call, then don't try to recover it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFSv4: Recovery of recalled read delegations is brokenTrond Myklebust
When a read delegation is being recalled, and we're reclaiming the cached opens, we need to make sure that we only reclaim read-only modes. A previous attempt to do this, relied on retrieving the delegation type from the nfs4_opendata structure. Unfortunately, as Kinglong pointed out, this field can only be set when performing reboot recovery. Furthermore, if we call nfs4_open_recover(), then we end up clobbering the state->flags for all modes that we're not recovering... The fix is to have the delegation recall code pass this information to the recovery call, and then refactor the recovery code so that nfs4_open_delegation_recall() does not need to call nfs4_open_recover(). Reported-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 39f897fdbd46 ("NFSv4: When returning a delegation, don't...") Tested-by: Kinglong Mee <kinglongmee@gmail.com> Cc: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFS: Fix an infinite loop when layoutget fail with BAD_STATEIDKinglong Mee
If layouget fail with BAD_STATEID, restart should not using the old stateid. But, nfs client choose the layout stateid at first, and then the open stateid. To avoid the infinite loop of using bad stateid for layoutget, this patch sets the layout flag'ss NFS_LAYOUT_INVALID_STID bit to skip choosing the bad layout stateid. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFS: Do cleanup before resetting pageio read/write to mdsKinglong Mee
There is a reference leak of layout segment after resetting pageio read/write to mds. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-19Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - a boot regression (since v4.2) fix for some ARM configurations from Tyler - regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device from Jeff. These are tagged for -stable. - a pair of locking fixes from Axel that are hidden from lockdep since they involve device_lock(). The "btt" one is tagged for -stable, the other only applies to the new "pfn" mechanism in v4.3. - a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: mm: fix type cast in __pfn_to_phys() pmem: add proper fencing to pmem_rw_page() libnvdimm: pfn_devs: Fix locking in namespace_store libnvdimm: btt_devs: Fix locking in namespace_store blockdev: don't set S_DAX for misaligned partitions dax: fix O_DIRECT I/O to the last block of a blockdev
2015-09-19fs-writeback: unplug before cond_resched in writeback_sb_inodesChris Mason
Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes, which increases the merge rate when relatively contiguous small files are written by the filesystem. It helps both on flash and spindles. For an fs_mark workload creating 4K files in parallel across 8 drives, this commit improves performance ~9% more by unplugging before calling cond_resched(). cond_resched() doesn't trigger an implicit unplug, so explicitly getting the IO down to the device before scheduling reduces latencies for anyone waiting on clean pages. It also cuts down on how often we use kblockd to unplug, which means less work bouncing from one workqueue to another. Many more details about how we got here: https://lkml.org/lkml/2015/9/11/570 Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-17userfaultfd: add missing mmput() in error pathEric Biggers
This fixes a memleak if anon_inode_getfile() fails in userfaultfd(). Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>