summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-07-26btrfs: plumb fs_info into btrfs_workJeff Mahoney
In order to provide an fsid for trace events, we'll need a btrfs_fs_info pointer. The most lightweight way to do that for btrfs_work structures is to associate it with the __btrfs_workqueue structure. Each queued btrfs_work structure has a workqueue associated with it, so that's a natural fit. It's a privately defined structures, so we add accessors to retrieve the fs_info pointer. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: remove obsolete part of comment in statfsDavid Sterba
The mixed blockgroup reporting has been fixed by commit ae02d1bd070767e109f4a6f1bb1f466e9698a355 "btrfs: fix mixed block count of available space" Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: hide test-only member under ifdefDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: Ratelimit "no csum found" info messageNikolay Borisov
Recently during a crash it became apparent that this particular message can be printed so many times that it causes the softlockup detector to trigger. Fix it by ratelimiting it. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: Add ratelimit to btrfs printingNikolay Borisov
This patch adds ratelimiting to all messages which are not using the _rl version of the various printing APIs in btrfs. This is designed to be used as a safety net, since a flood messages might cause the softlockup detector to trigger. To reduce interference between different classes of messages use a separate ratelimit state for every class of message. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: fix unexpected balance crash due to BUG_ONLiu Bo
Mounting a btrfs can resume previous balance operations asynchronously. An user got a crash when one drive has some corrupt sectors. Since balance can cancel itself in case of any error, we can gracefully return errors to upper layers and let balance do the cancel job. Reported-by: sash <master.b.at.raven@chefmail.de> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: fix panic in balance due to EIOLiu Bo
During build_backref_tree(), if we fail to read a btree node, we can eventually run into BUG_ON(cache->nr_nodes) that we put in backref_cache_cleanup(), meaning we have at least one memory leak. This frees the backref_node that we's allocated at the very beginning of build_backref_tree(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: fix eb memory leak due to readpage failureLiu Bo
eb->io_pages is set in read_extent_buffer_pages(). In case of readpage failure, for pages that have been added to bio, it calls bio_endio and later readpage_io_failed_hook() does the work. When this eb's page (couldn't be the 1st page) fails to add itself to bio due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, and ends up with a memory leak eventually. This lets __do_readpage propagate errors to callers and adds the 'atomic_dec(&eb->io_pages)'. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: change BUG_ON()'s to ASSERT()'s in backref_cache_cleanup()Liu Bo
Since it is just an in-memory building of the backrefs of several btree blocks, nothing is fatal other than memory leaks, so this changes BUG_ON()'s to ASSERT()'s. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: fix free space calculation in dump_space_info()Wang Xiaoguang
In btrfs, btrfs_space_info's bytes_may_use is treated as fs used space, as what we do in reserve_metadata_bytes() or btrfs_alloc_data_chunk_ondemand(), so in dump_space_info(), when calculating free space, we should also subtract btrfs_space_info's bytes_may_use. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: subpage-blocksize: Rate limit scrub error messageChandan Rajendra
btrfs/073 invokes scrub ioctl in a tight loop. In subpage-blocksize scenario this results in a lot of "scrub: size assumption sectorsize != PAGE_SIZE " messages being printed on the console. To reduce the number of such messages this commit uses btrfs_err_rl() instead of btrfs_err(). Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksizeWang Xiaoguang
Extract cow_file_range() new parameters for both in-band dedupe and subpage sector size patchset. This should make conflict of both patchset to minimal, and reduce the effort needed to rebase them. Cc: Chandan Rajendra <chandan@linux.vnet.ibm.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: fix BUG_ON in btrfs_submit_compressed_writeLiu Bo
This is similar to btrfs_submit_compressed_read(), if we fail after bio is allocated, then we can use bio_endio() and errors are saved in bio->bi_error. But please note that we don't return errors to its caller because the caller assumes it won't call endio to cleanup on error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: make sure device is synced before returnAnand Jain
An inconsistent behavior due to stale reads from the disk was reported mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html This patch will make sure devices are synced before return in the unmount thread. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: reorg btrfs_close_one_device()Anand Jain
Moves closer to the caller and removes declaration Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: Cleanup compress_file_range()Ashish Samant
Remove unnecessary checks in compress_file_range(). Signed-off-by: Ashish Samant <ashish.samant@oracle.com> [ minor coding style fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: cleanup BUG_ON in merge_bioLiu Bo
One can use btrfs-corrupt-block to hit BUG_ON() in merge_bio(), thus this aims to stop anyone to panic the whole system by using their btrfs. Since the error in merge_bio can only come from __btrfs_map_block() when chunk tree mapping has something insane and __btrfs_map_block() has already had printed the reason, we can just return errors in merge_bio. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: Fix slab accounting flagsNikolay Borisov
BTRFS is using a variety of slab caches to satisfy internal needs. Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT, meaning allocations from the caches are going to be accounted as SReclaimable. At the same time btrfs is not registering any shrinkers whatsoever, thus preventing memory from the slabs to be shrunk. This means those caches are not in fact reclaimable. To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the inode cache, since this one is being freed by the generic VFS super_block shrinker. Also set the transaction related caches as SLAB_TEMPORARY, to better document the lifetime of the objects (it just translates to SLAB_RECLAIM_ACCOUNT). Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: Replace -ENOENT by -ERANGE in btrfs_get_acl()Salah Triki
size contains the value returned by posix_acl_from_xattr(), which returns -ERANGE, -ENODATA, zero, or an integer greater than zero. So replace -ENOENT by -ERANGE. Signed-off-by: Salah Triki <salah.triki@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: Handle uninitialised inode evictionNikolay Borisov
The code flow in btrfs_new_inode allows for btrfs_evict_inode to be called with not fully initialised inode (e.g. ->root member not being set). This can happen when btrfs_set_inode_index in btrfs_new_inode fails, which in turn would call iput for the newly allocated inode. This in turn leads to vfs calling into btrfs_evict_inode. This leads to null pointer dereference. To handle this situation check whether the passed inode has root set and just free it in case it doesn't. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: fix read_node_slot to return errorsLiu Bo
We use read_node_slot() to read btree node and it has two cases, a) slot is out of range, which means 'no such entry' b) we fail to read the block, due to checksum fails or corrupted content or not with uptodate flag. But we're returning NULL in both cases, this makes it return -ENOENT in case a) and return -EIO in case b), and this fixes its callers as well as btrfs_search_forward() 's caller to catch the new errors. The problem is reported by Peter Becker, and I can manage to hit the same BUG_ON by mounting my fuzz image. Reported-by: Peter Becker <floyd.net@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: fix double free of fs rootLiu Bo
I got this warning while mounting a btrfs image, [ 3020.509606] ------------[ cut here ]------------ [ 3020.510107] WARNING: CPU: 3 PID: 5581 at lib/idr.c:1051 ida_remove+0xca/0x190 [ 3020.510853] ida_remove called for id=42 which is not allocated. [ 3020.511466] Modules linked in: [ 3020.511802] CPU: 3 PID: 5581 Comm: mount Not tainted 4.7.0-rc5+ #274 [ 3020.512438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 3020.513385] 0000000000000286 0000000021295d86 ffff88006c66b8f0 ffffffff8182ba5a [ 3020.514153] 0000000000000000 0000000000000009 ffff88006c66b930 ffffffff810e0ed7 [ 3020.514928] 0000041b00000000 ffffffff8289a8c0 ffff88007f437880 0000000000000000 [ 3020.515717] Call Trace: [ 3020.515965] [<ffffffff8182ba5a>] dump_stack+0xc9/0x13f [ 3020.516487] [<ffffffff810e0ed7>] __warn+0x147/0x160 [ 3020.517005] [<ffffffff810e0f4f>] warn_slowpath_fmt+0x5f/0x80 [ 3020.517572] [<ffffffff8182e6ca>] ida_remove+0xca/0x190 [ 3020.518075] [<ffffffff813a2bcc>] free_anon_bdev+0x2c/0x60 [ 3020.518609] [<ffffffff81657a9f>] free_fs_root+0x13f/0x160 [ 3020.519138] [<ffffffff8165c679>] btrfs_get_fs_root+0x379/0x3d0 [ 3020.519710] [<ffffffff81e6e975>] ? __mutex_unlock_slowpath+0x155/0x2c0 [ 3020.520366] [<ffffffff816615b1>] open_ctree+0x2e91/0x3200 [ 3020.520965] [<ffffffff8161ede2>] btrfs_mount+0x1322/0x15b0 [ 3020.521536] [<ffffffff81e60e74>] ? kmemleak_alloc_percpu+0x44/0x170 [ 3020.522167] [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210 [ 3020.522780] [<ffffffff813a4f59>] mount_fs+0x49/0x2c0 [ 3020.523305] [<ffffffff813d840c>] vfs_kern_mount+0xac/0x1b0 [ 3020.523872] [<ffffffff8161dee1>] btrfs_mount+0x421/0x15b0 [ 3020.524402] [<ffffffff81e60e74>] ? kmemleak_alloc_percpu+0x44/0x170 [ 3020.525045] [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210 [ 3020.525657] [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210 [ 3020.526289] [<ffffffff813a4f59>] mount_fs+0x49/0x2c0 [ 3020.526803] [<ffffffff813d840c>] vfs_kern_mount+0xac/0x1b0 [ 3020.527365] [<ffffffff813dc27a>] do_mount+0x41a/0x1770 [ 3020.527899] [<ffffffff812e800d>] ? strndup_user+0x6d/0xc0 [ 3020.528447] [<ffffffff812e7f68>] ? memdup_user+0x78/0xb0 [ 3020.528987] [<ffffffff813ddad0>] SyS_mount+0x150/0x160 [ 3020.529493] [<ffffffff81e72b7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd It turns out that we free fs root twice, btrfs_init_fs_root() calls free_anon_bdev(root->anon_dev) and later then btrfs_get_fs_root() cals free_fs_root which does another free_anon_bdev() and it ends up with the above warning. Instead of reset root->anon_dev to 0 after free_anon_bdev(), we can let btrfs_init_fs_root() return directly since its callers have already done the free job by calling free_fs_root(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: error out if generic_bin_search get invalid argumentsLiu Bo
With btrfs-corrupt-block, one can set btree node/leaf's field, if we assign a negative value to node/leaf, we can get various hangs, eg. if extent_root's nritems is -2ULL, then we get stuck in btrfs_read_block_groups() because it has a while loop and btrfs_search_slot() on extent_root will always return the first child. This lets us know what's happening and returns a EINVAL to callers instead of returning the first item. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26Btrfs: check inconsistence between chunk and block groupLiu Bo
With btrfs-corrupt-block, one can drop one chunk item and mounting will end up with a panic in btrfs_full_stripe_len(). This doesn't not remove the BUG_ON, but instead checks it a bit earlier when we find the block group item. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: add missing bytes_readonly attribute file in sysfsWang Xiaoguang
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-25Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "This update provides the following changes: - The rework of the timer wheel which addresses the shortcomings of the current wheel (cascading, slow search for next expiring timer, etc). That's the first major change of the wheel in almost 20 years since Finn implemted it. - A large overhaul of the clocksource drivers init functions to consolidate the Device Tree initialization - Some more Y2038 updates - A capability fix for timerfd - Yet another clock chip driver - The usual pile of updates, comment improvements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits) tick/nohz: Optimize nohz idle enter clockevents: Make clockevents_subsys static clocksource/drivers/time-armada-370-xp: Fix return value check timers: Implement optimization for same expiry time in mod_timer() timers: Split out index calculation timers: Only wake softirq if necessary timers: Forward the wheel clock whenever possible timers/nohz: Remove pointless tick_nohz_kick_tick() function timers: Optimize collect_expired_timers() for NOHZ timers: Move __run_timers() function timers: Remove set_timer_slack() leftovers timers: Switch to a non-cascading wheel timers: Reduce the CPU index space to 256k timers: Give a few structs and members proper names hlist: Add hlist_is_singular_node() helper signals: Use hrtimer for sigtimedwait() timers: Remove the deprecated mod_timer_pinned() API timers, net/ipv4/inet: Initialize connection request timers as pinned timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned timers, drivers/tty/metag_da: Initialize the poll timer as pinned ...
2016-07-25Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "Various x86 low level modifications: - preparatory work to support virtually mapped kernel stacks (Andy Lutomirski) - support for 64-bit __get_user() on 32-bit kernels (Benjamin LaHaise) - (involved) workaround for Knights Landing CPU erratum (Dave Hansen) - MPX enhancements (Dave Hansen) - mremap() extension to allow remapping of the special VDSO vma, for purposes of user level context save/restore (Dmitry Safonov) - hweight and entry code cleanups (Borislav Petkov) - bitops code generation optimizations and cleanups with modern GCC (H. Peter Anvin) - syscall entry code optimizations (Paolo Bonzini)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) x86/mm/cpa: Add missing comment in populate_pdg() x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2 x86/smp: Remove unnecessary initialization of thread_info::cpu x86/smp: Remove stack_smp_processor_id() x86/uaccess: Move thread_info::addr_limit to thread_struct x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct x86/dumpstack: When OOPSing, rewind the stack before do_exit() x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS x86/dumpstack: Try harder to get a call trace on stack overflow x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables() x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() x86/mm: Use pte_none() to test for empty PTE x86/mm: Disallow running with 32-bit PTEs to work around erratum x86/mm: Ignore A/D bits in pte/pmd/pud_none() x86/mm: Move swap offset/type up in PTE to work around erratum x86/entry: Inline enter_from_user_mode() ...
2016-07-25Merge tag 'v4.7' into for-linus/pstoreKees Cook
Linux 4.7
2016-07-25f2fs: clean up coding style and redundancyJaegeuk Kim
This patch includes minor clean-ups. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-25binfmt_flat: use proper user space accessors with relocs processing codeNicolas Pitre
Relocs are fixed up in place in user space memory. The appropriate accessors are required for this code to work with an active MMU. The architecture specific handlers flat_get_addr_from_rp() and flat_put_addr_at_rp() for ARM and M68K are adjusted with separate patches. SuperH and Xtensa are left out as they doesn't implement __get_user_unaligned() and __put_user_unaligned() yet. The other architectures that use BFLT don't have any MMU. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25binfmt_flat: clean up create_flat_tables() and stack accessesNicolas Pitre
In addition to better code clarity, this brings proper usage of user memory accessors everywhere the stack is touched. This is essential for making this work on MMU systems. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25binfmt_flat: use generic transfer_args_to_stack()Nicolas Pitre
This gets rid of the rather ugly, open coded and suboptimal copy code. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25elf_fdpic_transfer_args_to_stack(): make it genericNicolas Pitre
This copying of arguments and environment is common to both NOMMU binary formats we support. Let's make the elf_fdpic version available to the flat format as well. While at it, improve the code a bit not to copy below the actual data area. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25binfmt_flat: prevent kernel dammage from corrupted executable headersNicolas Pitre
Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25binfmt_flat: convert printk invocations to their modern formNicolas Pitre
Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25binfmt_flat: assorted cleanupsNicolas Pitre
Remove excessive casts, do some code grouping, fix most important checkpatch.pl complaints, etc. No functional changes. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-24Merge tag 'char-misc-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.8-rc1. Not a lot of stuff, but it's all over the place, full details are in the shortlog. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (49 commits) lkdtm: silence warnings about function declarations lkdtm: hide unused functions intel_th: pci: Add Kaby Lake PCH-H support intel_th: Fix a deadlock in modprobing dsp56k: prevent a harmless underflow chardev: add missing line break in pr_warn lkdtm: use struct arrays instead of enums lkdtm: move jprobe entry points to start of source lkdtm: reorganize module paramaters lkdtm: rename globals for clarity lkdtm: rename "count" to "crash_count" lkdtm: remove intentional off-by-one array access lkdtm: split remaining logic bug tests to separate file lkdtm: split heap corruption tests to separate file lkdtm: split memory permissions tests to separate file lkdtm: split usercopy tests to separate file lkdtm: drop "alloc_size" parameter lkdtm: add usercopy test for blocking kernel text extcon: adc-jack: add suspend/resume support extcon: add missing of_node_put after calling of_parse_phandle ...
2016-07-24Merge tag 'gfs2-4.7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got ten patches this time, half of which are related to a plethora of nasty outcomes when inodes are transitioned from the unlinked state to the free state. Small file systems are particularly vulnerable to these problems, and it can manifest as mainly hangs, but also file system corruption. The patches have been tested for literally many weeks, with a very gruelling test, so I have a high level of confidence. - Andreas Gruenbacher wrote a series of five patches for various lockups during the transition of inodes from unlinked to free. The main patch is titled "Fix gfs2_lookup_by_inum lock inversion" and the other four are support and cleanup patches related to that. - Ben Marzinski contributed two patches with regard to a recreatable problem when gfs2 tries to write a page to a file that is being truncated, resulting in a BUG() in gfs2_remove_from_journal. Note that Ben had to export vfs function __block_write_full_page to get this to work properly. It's been posted a long time and he talked to various VFS people about it, and nobody seemed to mind. - I contributed 3 patches: o The first one fixes a memory corruptor: a race in which one process can overwrite the gl_object pointer set by another process, causing kernel panic and other symptoms. o The second patch fixes another race that resulted in a false-positive BUG_ON. This occurred when resource group reservations were freed by one process while another process was trying to grab a new reservation in the same resource group. o The third patch fixes a problem with doing journal replay when the journals are not all the same size" * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes GFS2: Check rs_free with rd_rsspin protection gfs2: writeout truncated pages fs: export __block_write_full_page gfs2: Lock holder cleanup gfs2: Large-filesystem fix for 32-bit systems gfs2: Get rid of gfs2_ilookup gfs2: Fix gfs2_lookup_by_inum lock inversion gfs2: Initialize iopen glock holder for new inodes GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
2016-07-24NFSv4.2: Fix warning "variable ‘stateids’ set but not used"Trond Myklebust
Replace it with a test for whether or not the sent a stateid in violation of what we asked for. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"Trond Myklebust
Make it static Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24Merge branch 'nfs-rdma'Trond Myklebust
2016-07-24Merge branch 'pnfs'Trond Myklebust
2016-07-24Merge branch 'writeback'Trond Myklebust
2016-07-24fs/dcache.c: avoid soft-lockup in dput()Wei Fang
We triggered soft-lockup under stress test which open/access/write/close one file concurrently on more than five different CPUs: WARN: soft lockup - CPU#0 stuck for 11s! [who:30631] ... [<ffffffc0003986f8>] dput+0x100/0x298 [<ffffffc00038c2dc>] terminate_walk+0x4c/0x60 [<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8 [<ffffffc00038f780>] filename_lookup+0x38/0xf0 [<ffffffc000391180>] user_path_at_empty+0x78/0xd0 [<ffffffc0003911f4>] user_path_at+0x1c/0x28 [<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230 ->d_lock trylock may failed many times because of concurrently operations, and dput() may execute a long time. Fix this by replacing cpu_relax() with cond_resched(). dput() used to be sleepable, so make it sleepable again should be safe. Cc: <stable@vger.kernel.org> Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-24vfs: new d_init methodMiklos Szeredi
Allow filesystem to initialize dentry at allocation time. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-24Merge branch 'test.d_iput' into work.miscAl Viro
2016-07-24vfs: Update lookup_dcache() commentOleg Drokin
commit 6c51e513a3aa ("lookup_dcache(): lift d_alloc() into callers") removed the need_lookup argument from lookup_dcache(), but the comment was forgotten. Also it no longer allocates a new dentry if nothing was found. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-24pNFS: Remove redundant smp_mb() from pnfs_init_lseg()Trond Myklebust
It's not visible yet, and won't be until after we grab the inode->i_lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Cleanup - do layout segment initialisation in one placeTrond Myklebust
...instead of splitting the initialisation over init_lseg() and pnfs_layout_process(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Remove redundant stateid invalidationTrond Myklebust
The layout stateid will be invalidated once it holds no more layout segments anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>