summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-05-07f2fs: inject kmalloc failureJaegeuk Kim
This patch injects kmalloc failure given a fault injection rate. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07f2fs: add mount option to select fault injection ratioJaegeuk Kim
This patch adds a mount option to select fault ratio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07f2fs: use f2fs_grab_cache_page instead of grab_cache_pageJaegeuk Kim
This patch converts grab_cache_page to f2fs_grab_cache_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07f2fs: introduce f2fs_kmalloc to wrap kmallocJaegeuk Kim
This patch adds f2fs_kmalloc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07f2fs: add proc entry to show valid block bitmapJaegeuk Kim
This patch adds a new proc entry to show segment information in more detail. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07f2fs: introduce macros for proc entriesJaegeuk Kim
This adds macros to be used multiple proc entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07efivarfs: Make efivarfs_file_ioctl() staticPeter Jones
There are no callers except through the file_operations struct below this, so it should be static like everything else here. Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462570771-13324-6-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07efi: Merge boolean flag argumentsJulia Lawall
The parameters atomic and duplicates of efivar_init always have opposite values. Drop the parameter atomic, replace the uses of !atomic with duplicates, and update the call sites accordingly. The code using duplicates is slightly reorganized with an 'else', to avoid duplicating the lock code. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Saurabh Sengar <saurabh.truth@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vaishali Thakkar <vaishali.thakkar@oracle.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462570771-13324-5-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-06GFS2: Refactor gfs2_remove_from_journalBob Peterson
This patch makes two simple changes to function gfs2_remove_from_journal. First, it removes the parameter that specifies the transaction. Since it's always passed in as current->journal_info, we might as well set that in the function rather than passing it in. Second, it changes the meta parameter to use an enum to make the code more clear. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-05-06btrfs: don't force mounts to wait for cleaner_kthread to delete one or more ↵Zygo Blaxell
subvolumes During a mount, we start the cleaner kthread first because the transaction kthread wants to wake up the cleaner kthread. We start the transaction kthread next because everything in btrfs wants transactions. We do reloc recovery in the thread that was doing the original mount call once the transaction kthread is running. This means that the cleaner kthread could already be running when reloc recovery happens (e.g. if a snapshot delete was started before a crash). Relocation does not play well with the cleaner kthread, so a mutex was added in commit 5f3164813b90f7dbcb5c3ab9006906222ce471b7 "Btrfs: fix race between balance recovery and root deletion" to prevent both from being active at the same time. If the cleaner kthread is already holding the mutex by the time we get to btrfs_recover_relocation, the mount will be blocked until at least one deleted subvolume is cleaned (possibly more if the mount process doesn't get the lock right away). During this time (which could be an arbitrarily long time on a large/slow filesystem), the mount process is stuck and the filesystem is unnecessarily inaccessible. Fix this by locking cleaner_mutex before we start cleaner_kthread, and unlocking the mutex after mount no longer requires it. This ensures that the mounting process will not be blocked by the cleaner kthread. The cleaner kthread is already prepared for mutex contention and will just go to sleep until the mutex is available. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: ioctl: reorder exclusive op check in RM_DEVDavid Sterba
Move the op exclusivity check before the other code (same as in ADD_DEV). Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: add write protection to SET_FEATURES ioctlDavid Sterba
Perform the want_write check if we get far enough to do any writes. Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: fix lock dep warning move scratch super outside of chunk_mutexAnand Jain
Move scratch super outside of the chunk lock to avoid below lockdep warning. The better place to scratch super is in the function btrfs_rm_dev_replace_free_srcdev() just before free_device, which is outside of the chunk lock as well. To reproduce: (fresh boot) mkfs.btrfs -f -draid5 -mraid5 /dev/sdc /dev/sdd /dev/sde mount /dev/sdc /btrfs dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 (get devmgt from https://github.com/asj/devmgt.git) devmgt detach /dev/sde dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 sync btrfs replace start -Brf 3 /dev/sdf /btrfs <-- devmgt attach host7 ====================================================== [ INFO: possible circular locking dependency detected ] 4.6.0-rc2asj+ #1 Not tainted --------------------------------------------------- btrfs/2174 is trying to acquire lock: (sb_writers){.+.+.+}, at: [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0 but task is already holding lock: (&fs_info->chunk_mutex){+.+.+.}, at: [<ffffffffa05c5f55>] btrfs_dev_replace_finishing+0x145/0x980 [btrfs] which lock already depends on the new lock. Chain exists of: sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(sb_writers); *** DEADLOCK *** -> #0 (sb_writers){.+.+.+}: [<ffffffff810e6415>] __lock_acquire+0x1bc5/0x1ee0 [<ffffffff810e707e>] lock_acquire+0xbe/0x210 [<ffffffff810df49a>] percpu_down_read+0x4a/0xa0 [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0 [<ffffffff81265534>] mnt_want_write+0x24/0x50 [<ffffffff812508a2>] path_openat+0x952/0x1190 [<ffffffff81252451>] do_filp_open+0x91/0x100 [<ffffffff8123f5cc>] file_open_name+0xfc/0x140 [<ffffffff8123f643>] filp_open+0x33/0x60 [<ffffffffa0572bb6>] update_dev_time+0x16/0x40 [btrfs] [<ffffffffa057f60d>] btrfs_scratch_superblocks+0x5d/0xb0 [btrfs] [<ffffffffa057f70e>] btrfs_rm_dev_replace_remove_srcdev+0xae/0xd0 [btrfs] [<ffffffffa05c62c5>] btrfs_dev_replace_finishing+0x4b5/0x980 [btrfs] [<ffffffffa05c6ae8>] btrfs_dev_replace_start+0x358/0x530 [btrfs] Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: Fix BUG_ON condition in scrub_setup_recheck_block()Ashish Samant
pagev array in scrub_block{} is of size SCRUB_MAX_PAGES_PER_BLOCK. page_index should be checked with the same to trigger BUG_ON(). Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06Btrfs: remove BUG_ON()'s in btrfs_map_blockJosef Bacik
btrfs_map_block can go horribly wrong in the face of fs corruption, lets agree to not be assholes and panic at any possible chance things are all fucked up. Signed-off-by: Josef Bacik <jbacik@fb.com> [ removed type casts ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06Btrfs: fix divide error upon chunk's stripe_lenLiu Bo
The struct 'map_lookup' uses type int for @stripe_len, while btrfs_chunk_stripe_len() can return a u64 value, and it may end up with @stripe_len being undefined value and it can lead to 'divide error' in __btrfs_map_block(). This changes 'map_lookup' to use type u64 for stripe_len, also right now we only use BTRFS_STRIPE_LEN for stripe_len, so this adds a valid checker for BTRFS_STRIPE_LEN. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ folded division fix to scrub_raid56_parity ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: sysfs: protect reading label by lockDavid Sterba
If the label setting ioctl races with sysfs label handler, we could get mixed result in the output, part old part new. We should either get the old or new label. The chances to hit this race are low. Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: add check to sysfs handler of labelDavid Sterba
Add a sanity check for the fs_info as we will dereference it, similar to what the 'store features' handler does. Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: add read-only check to sysfs handler of featuresDavid Sterba
We don't want to trigger the change on a read-only filesystem, similar to what the label handler does. Signed-off-by: David Sterba <dsterba@suse.cz>
2016-05-06btrfs: reuse existing variable in scrub_stripe, reduce stack usageDavid Sterba
The key variable occupies 17 bytes, the key_start is used once, we can simply reuse existing 'key' for that purpose. As the key is not a simple type, compiler doest not do it on itself. Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: use dynamic allocation for root item in create_subvolDavid Sterba
The size of root item is more than 400 bytes, which is quite a lot of stack space. As we do IO from inside the subvolume ioctls, we should keep the stack usage low in case the filesystem is on top of other layers (NFS, device mapper, iscsi, etc). Reviewed-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: clone: use vmalloc only as fallback for nodesize buferDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: send: use vmalloc only as fallback for clone_sources_tmpDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: send: use vmalloc only as fallback for clone_rootsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: send: use temporary variable to store allocation sizeDavid Sterba
We're going to use the argument multiple times later. Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: send: use vmalloc only as fallback for read_bufDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: send: use vmalloc only as fallback for send_bufDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and ↵Anand Jain
uuid_mutex When the replace target fails, the target device will be taken out of fs device list, scratch + update_dev_time and freed. However we could do the scratch + update_dev_time and free part after the device has been taken out of device list, so that we don't have to hold the device_list_mutex and uuid_mutex locks. Reported issue: [ 5375.718845] ====================================================== [ 5375.718846] [ INFO: possible circular locking dependency detected ] [ 5375.718849] 4.4.5-scst31x-debug-11+ #40 Not tainted [ 5375.718849] ------------------------------------------------------- [ 5375.718851] btrfs-health/4662 is trying to acquire lock: [ 5375.718861] (sb_writers){.+.+.+}, at: [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0 [ 5375.718862] [ 5375.718862] but task is already holding lock: [ 5375.718907] (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs] [ 5375.718907] [ 5375.718907] which lock already depends on the new lock. [ 5375.718907] [ 5375.718908] [ 5375.718908] the existing dependency chain (in reverse order) is: [ 5375.718911] [ 5375.718911] -> #3 (&fs_devs->device_list_mutex){+.+.+.}: [ 5375.718917] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.718921] [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0 [ 5375.718940] [<ffffffffa0219bf6>] btrfs_show_devname+0x36/0x210 [btrfs] [ 5375.718945] [<ffffffff81267079>] show_vfsmnt+0x49/0x150 [ 5375.718948] [<ffffffff81240b07>] m_show+0x17/0x20 [ 5375.718951] [<ffffffff81246868>] seq_read+0x2d8/0x3b0 [ 5375.718955] [<ffffffff8121df28>] __vfs_read+0x28/0xd0 [ 5375.718959] [<ffffffff8121e806>] vfs_read+0x86/0x130 [ 5375.718962] [<ffffffff8121f4c9>] SyS_read+0x49/0xa0 [ 5375.718966] [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 5375.718968] [ 5375.718968] -> #2 (namespace_sem){+++++.}: [ 5375.718971] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.718974] [<ffffffff81635199>] down_write+0x49/0x80 [ 5375.718977] [<ffffffff81243593>] lock_mount+0x43/0x1c0 [ 5375.718979] [<ffffffff81243c13>] do_add_mount+0x23/0xd0 [ 5375.718982] [<ffffffff81244afb>] do_mount+0x27b/0xe30 [ 5375.718985] [<ffffffff812459dc>] SyS_mount+0x8c/0xd0 [ 5375.718988] [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 5375.718991] [ 5375.718991] -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}: [ 5375.718994] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.718996] [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0 [ 5375.719001] [<ffffffff8122d608>] path_openat+0x468/0x1360 [ 5375.719004] [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0 [ 5375.719007] [<ffffffff8121da7b>] do_sys_open+0x12b/0x210 [ 5375.719010] [<ffffffff8121db7e>] SyS_open+0x1e/0x20 [ 5375.719013] [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 5375.719015] [ 5375.719015] -> #0 (sb_writers){.+.+.+}: [ 5375.719018] [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0 [ 5375.719021] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.719026] [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0 [ 5375.719028] [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0 [ 5375.719031] [<ffffffff81242eb4>] mnt_want_write+0x24/0x50 [ 5375.719035] [<ffffffff8122ded2>] path_openat+0xd32/0x1360 [ 5375.719037] [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0 [ 5375.719040] [<ffffffff8121d8a4>] file_open_name+0xe4/0x130 [ 5375.719043] [<ffffffff8121d923>] filp_open+0x33/0x60 [ 5375.719073] [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs] [ 5375.719099] [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs] [ 5375.719123] [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs] [ 5375.719150] [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs] [ 5375.719175] [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs] [ 5375.719199] [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs] [ 5375.719222] [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs] [ 5375.719225] [<ffffffff810a70df>] kthread+0xef/0x110 [ 5375.719229] [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70 [ 5375.719230] [ 5375.719230] other info that might help us debug this: [ 5375.719230] [ 5375.719233] Chain exists of: [ 5375.719233] sb_writers --> namespace_sem --> &fs_devs->device_list_mutex [ 5375.719233] [ 5375.719234] Possible unsafe locking scenario: [ 5375.719234] [ 5375.719234] CPU0 CPU1 [ 5375.719235] ---- ---- [ 5375.719236] lock(&fs_devs->device_list_mutex); [ 5375.719238] lock(namespace_sem); [ 5375.719239] lock(&fs_devs->device_list_mutex); [ 5375.719241] lock(sb_writers); [ 5375.719241] [ 5375.719241] *** DEADLOCK *** [ 5375.719241] [ 5375.719243] 4 locks held by btrfs-health/4662: [ 5375.719266] #0: (&fs_info->health_mutex){+.+.+.}, at: [<ffffffffa0246303>] health_kthread+0x63/0x490 [btrfs] [ 5375.719293] #1: (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.+.}, at: [<ffffffffa02c6611>] btrfs_dev_replace_finishing+0x41/0x990 [btrfs] [ 5375.719319] #2: (uuid_mutex){+.+.+.}, at: [<ffffffffa0282620>] btrfs_destroy_dev_replace_tgtdev+0x20/0x150 [btrfs] [ 5375.719343] #3: (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs] [ 5375.719343] [ 5375.719343] stack backtrace: [ 5375.719347] CPU: 2 PID: 4662 Comm: btrfs-health Not tainted 4.4.5-scst31x-debug-11+ #40 [ 5375.719348] Hardware name: Supermicro SYS-6018R-WTRT/X10DRW-iT, BIOS 1.0c 01/07/2015 [ 5375.719352] 0000000000000000 ffff880856f73880 ffffffff813529e3 ffffffff826182a0 [ 5375.719354] ffffffff8260c090 ffff880856f738c0 ffffffff810d667c ffff880856f73930 [ 5375.719357] ffff880861f32b40 ffff880861f32b68 0000000000000003 0000000000000004 [ 5375.719357] Call Trace: [ 5375.719363] [<ffffffff813529e3>] dump_stack+0x85/0xc2 [ 5375.719366] [<ffffffff810d667c>] print_circular_bug+0x1ec/0x260 [ 5375.719369] [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0 [ 5375.719373] [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 5375.719376] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.719378] [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0 [ 5375.719383] [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0 [ 5375.719385] [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0 [ 5375.719387] [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0 [ 5375.719389] [<ffffffff81242eb4>] mnt_want_write+0x24/0x50 [ 5375.719393] [<ffffffff8122ded2>] path_openat+0xd32/0x1360 [ 5375.719415] [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs] [ 5375.719418] [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 5375.719420] [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0 [ 5375.719423] [<ffffffff810f615d>] ? rcu_read_lock_sched_held+0x6d/0x80 [ 5375.719426] [<ffffffff81201a9b>] ? kmem_cache_alloc+0x26b/0x5d0 [ 5375.719430] [<ffffffff8122e7d4>] ? getname_kernel+0x34/0x120 [ 5375.719433] [<ffffffff8121d8a4>] file_open_name+0xe4/0x130 [ 5375.719436] [<ffffffff8121d923>] filp_open+0x33/0x60 [ 5375.719462] [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs] [ 5375.719485] [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs] [ 5375.719506] [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs] [ 5375.719530] [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs] [ 5375.719554] [<ffffffffa02c6b23>] ? btrfs_dev_replace_finishing+0x553/0x990 [btrfs] [ 5375.719576] [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs] [ 5375.719598] [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs] [ 5375.719621] [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs] [ 5375.719641] [<ffffffffa02463d8>] ? health_kthread+0x138/0x490 [btrfs] [ 5375.719661] [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs] [ 5375.719663] [<ffffffff810a70df>] kthread+0xef/0x110 [ 5375.719666] [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200 [ 5375.719669] [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70 [ 5375.719672] [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200 [ 5375.719697] ------------[ cut here ]------------ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: send: silence an integer overflow warningDan Carpenter
The "sizeof(*arg->clone_sources) * arg->clone_sources_count" expression can overflow. It causes several static checker warnings. It's all under CAP_SYS_ADMIN so it's not that serious but lets silence the warnings. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: avoid overflowing f_bfreeLuis de Bethencourt
Since mixed block groups accounting isn't byte-accurate and f_bree is an unsigned integer, it could overflow. Avoid this. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Suggested-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: fix mixed block count of available spaceLuis de Bethencourt
Metadata for mixed block is already accounted in total data and should not be counted as part of the free metadata space. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=114281 Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: allow balancing to dup with multi-deviceAustin S. Hemmelgarn
Currently, we don't allow the user to try and rebalance to a dup profile on a multi-device filesystem. In most cases, this is a perfectly sensible restriction as raid1 uses the same amount of space and provides better protection. However, when reshaping a multi-device filesystem down to a single device filesystem, this requires the user to convert metadata and system chunks to single profile before deleting devices, and then convert again to dup, which leaves a period of time where metadata integrity is reduced. This patch removes the single-device-only restriction from converting to dup profile to remove this potential data integrity reduction. Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: ioctl: reorder exclusive op check in RM_DEVDavid Sterba
Move the op exclusivity check before the other code (same as in ADD_DEV). Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06btrfs: kill unused writepage_io_hook callbackDavid Sterba
It seems to be long time unused, since 2008 and 6885f308b5570 ("Btrfs: Misc 2.6.25 updates"). Propagating the removal touches some code but has no functional effect. Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-05ext4: remove unmeetable inconsisteny check from ext4_find_extent()Nicolai Stange
ext4_find_extent(), stripped down to the parts relevant to this patch, reads as ppos = 0; i = depth; while (i) { --i; ++ppos; if (unlikely(ppos > depth)) { ... ret = -EFSCORRUPTED; goto err; } } Due to the loop's bounds, the condition ppos > depth can never be met. Remove this dead code. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-05ext4: remove unnecessary bio get/putJens Axboe
ext4_io_submit() used to check for EOPNOTSUPP after bio submission, which is why it had to get an extra reference to the bio before submitting it. But since we no longer touch the bio after submission, get rid of the redundant get/put of the bio. If we do get the extra reference, we enter the slower path of having to flag this bio as now having external references. Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-05proc: prevent accessing /proc/<PID>/environ until it's readyMathias Krause
If /proc/<PID>/environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc/<PID>/cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. The expected consequence is that userland trying to access /proc/<PID>/environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461 Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05ext4: silence UBSAN in ext4_mb_init()Nicolai Stange
Currently, in ext4_mb_init(), there's a loop like the following: do { ... offset += 1 << (sb->s_blocksize_bits - i); i++; } while (i <= sb->s_blocksize_bits + 1); Note that the updated offset is used in the loop's next iteration only. However, at the last iteration, that is at i == sb->s_blocksize_bits + 1, the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15 shift exponent 4294967295 is too large for 32-bit type 'int' [...] Call Trace: [<ffffffff818c4d25>] dump_stack+0xbc/0x117 [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390 [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0 [<ffffffff814293c7>] ? create_cache+0x57/0x1f0 [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0 [<ffffffff821c2168>] ? mutex_lock+0x38/0x60 [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50 [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0 [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0 [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0 [...] Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1. Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of offset is never used again. Silence UBSAN by introducing another variable, offset_incr, holding the next increment to apply to offset and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-06Merge tag 'keys-next-20160505' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next
2016-05-05ext4: address UBSAN warning in mb_find_order_for_block()Nicolai Stange
Currently, in mb_find_order_for_block(), there's a loop like the following: while (order <= e4b->bd_blkbits + 1) { ... bb += 1 << (e4b->bd_blkbits - order); } Note that the updated bb is used in the loop's next iteration only. However, at the last iteration, that is at order == e4b->bd_blkbits + 1, the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11 shift exponent -1 is negative [...] Call Trace: [<ffffffff818c4d35>] dump_stack+0xbc/0x117 [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590 [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80 [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240 [...] Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of bb is never used again. Silence UBSAN by introducing another variable, bb_incr, holding the next increment to apply to bb and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fix from Eric Biederman: "This contains just a single fix for a nasty oops" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: propogate_mnt: Handle the first propogated copy being a slave
2016-05-05ext4: fix oops on corrupted filesystemJan Kara
When filesystem is corrupted in the right way, it can happen ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we subsequently remove inode from the in-memory orphan list. However this deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we leave i_orphan list_head with a stale content. Later we can look at this content causing list corruption, oops, or other issues. The reported trace looked like: WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100() list_del corruption, 0000000061c1d6e0->next is LIST_POISON1 0000000000100100) CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250 Stack: 60462947 62219960 602ede24 62219960 602ede24 603ca293 622198f0 602f02eb 62219950 6002c12c 62219900 601b4d6b Call Trace: [<6005769c>] ? vprintk_emit+0x2dc/0x5c0 [<602ede24>] ? printk+0x0/0x94 [<600190bc>] show_stack+0xdc/0x1a0 [<602ede24>] ? printk+0x0/0x94 [<602ede24>] ? printk+0x0/0x94 [<602f02eb>] dump_stack+0x2a/0x2c [<6002c12c>] warn_slowpath_common+0x9c/0xf0 [<601b4d6b>] ? __list_del_entry+0x6b/0x100 [<6002c254>] warn_slowpath_fmt+0x94/0xa0 [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0 [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0 [<60023ebf>] ? set_signals+0x3f/0x50 [<600a205a>] ? kmem_cache_free+0x10a/0x180 [<602f4e88>] ? mutex_lock+0x18/0x30 [<601b4d6b>] __list_del_entry+0x6b/0x100 [<601177ec>] ext4_orphan_del+0x22c/0x2f0 [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0 [<6010b973>] ? ext4_truncate+0x383/0x390 [<6010bc8b>] ext4_write_begin+0x30b/0x4b0 [<6001bb50>] ? copy_from_user+0x0/0xb0 [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0 [<60072c4f>] generic_perform_write+0xaf/0x1e0 [<600c4166>] ? file_update_time+0x46/0x110 [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0 [<6010030f>] ext4_file_write_iter+0x15f/0x470 [<60094e10>] ? unlink_file_vma+0x0/0x70 [<6009b180>] ? unlink_anon_vmas+0x0/0x260 [<6008f169>] ? free_pgtables+0xb9/0x100 [<600a6030>] __vfs_write+0xb0/0x130 [<600a61d5>] vfs_write+0xa5/0x170 [<600a63d6>] SyS_write+0x56/0xe0 [<6029fcb0>] ? __libc_waitpid+0x0/0xa0 [<6001b698>] handle_syscall+0x68/0x90 [<6002633d>] userspace+0x4fd/0x600 [<6002274f>] ? save_registers+0x1f/0x40 [<60028bd7>] ? arch_prctl+0x177/0x1b0 [<60017bd5>] fork_handler+0x85/0x90 Fix the problem by using list_del_init() as we always should with i_orphan list. CC: stable@vger.kernel.org Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-05propogate_mnt: Handle the first propogated copy being a slaveEric W. Biederman
When the first propgated copy was a slave the following oops would result: > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > PGD bacd4067 PUD bac66067 PMD 0 > Oops: 0000 [#1] SMP > Modules linked in: > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000 > RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283 > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480 > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000 > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00 > FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0 > Stack: > ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85 > ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40 > 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0 > Call Trace: > [<ffffffff811fbf85>] propagate_mnt+0x105/0x140 > [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0 > [<ffffffff811f1ec3>] graft_tree+0x63/0x70 > [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100 > [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0 > [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70 > [<ffffffff811f3a45>] SyS_mount+0x75/0xc0 > [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0 > [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25 > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30 > RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP <ffff8800bac3fd38> > CR2: 0000000000000010 > ---[ end trace 2725ecd95164f217 ]--- This oops happens with the namespace_sem held and can be triggered by non-root users. An all around not pleasant experience. To avoid this scenario when finding the appropriate source mount to copy stop the walk up the mnt_master chain when the first source mount is encountered. Further rewrite the walk up the last_source mnt_master chain so that it is clear what is going on. The reason why the first source mount is special is that it it's mnt_parent is not a mount in the dest_mnt propagation tree, and as such termination conditions based up on the dest_mnt mount propgation tree do not make sense. To avoid other kinds of confusion last_dest is not changed when computing last_source. last_dest is only used once in propagate_one and that is above the point of the code being modified, so changing the global variable is meaningless and confusing. Cc: stable@vger.kernel.org fixes: f2ebb3a921c1ca1e2ddd9242e95a1989a50c4c68 ("smarter propagate_mnt()") Reported-by: Tycho Andersen <tycho.andersen@canonical.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-05-05ext4: fix check of dqget() return value in ext4_ioctl_setproject()Seth Forshee
A failed call to dqget() returns an ERR_PTR() and not null. Fix the check in ext4_ioctl_setproject() to handle this correctly. Fixes: 9b7365fc1c82 ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support") Cc: stable@vger.kernel.org # v4.5 Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-05-04ecryptfs: fix handling of directory openingAl Viro
First of all, trying to open them r/w is idiocy; it's guaranteed to fail. Moreover, assigning ->f_pos and assuming that everything will work is blatantly broken - try that with e.g. tmpfs as underlying layer and watch the fireworks. There may be a non-trivial amount of state associated with current IO position, well beyond the numeric offset. Using the single struct file associated with underlying inode is really not a good idea; we ought to open one for each ecryptfs directory struct file. Additionally, file_operations both for directories and non-directories are full of pointless methods; non-directories should *not* have ->iterate(), directories should not have ->flush(), ->fasync() and ->splice_read(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-04btrfs: cleanup assigning next active device with a checkAnand Jain
Creates helper fucntion as needed by the device delete and replace operations. Also now it checks if the next device being assigned is an active device. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-04btrfs: s_bdev is not null after missing replaceAnand Jain
Yauhen reported in the ML that s_bdev is null at mount, and s_bdev gets updated to some device when missing device is replaced, as because bdev is null for missing device, things gets matched up. Fix this by checking if s_bdev is set. I didn't want to completely remove updating s_bdev because the future multi device support at vfs layer may need it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a regression and update the MAINTAINERS entry for fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: update mailing list in MAINTAINERS fuse: Fix return value from fuse_get_user_pages()
2016-05-03f2fs: factor out fsync inode entry operationsChao Yu
Factor out fsync inode entry operations into {add,del}_fsync_inode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>