summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-11-24jbd2: Fix unreclaimed pages after truncate in data=journal modeJan Kara
Ted and Namjae have reported that truncated pages don't get timely reclaimed after being truncated in data=journal mode. The following test triggers the issue easily: for (i = 0; i < 1000; i++) { pwrite(fd, buf, 1024*1024, 0); fsync(fd); fsync(fd); ftruncate(fd, 0); } The reason is that journal_unmap_buffer() finds that truncated buffers are not journalled (jh->b_transaction == NULL), they are part of checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have been already written out (!buffer_dirty(bh)). We clean such buffers but we leave them in the checkpoint list. Since checkpoint transaction holds a reference to the journal head, these buffers cannot be released until the checkpoint transaction is cleaned up. And at that point we don't call release_buffer_page() anymore so pages detached from mapping are lingering in the system waiting for reclaim to find them and free them. Fix the problem by removing buffers from transaction checkpoint lists when journal_unmap_buffer() finds out they don't have to be there anymore. Reported-and-tested-by: Namjae Jeon <namjae.jeon@samsung.com> Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-11-24ext4: Fix handling of extended tv_secDavid Turner
In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by: David Turner <novalis@novalis.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Mark Harris <mh8928@yahoo.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732 Cc: stable@vger.kernel.org
2015-11-24nfsd4: fix gss-proxy 4.1 mounts for some AD principalsJ. Bruce Fields
The principal name on a gss cred is used to setup the NFSv4.0 callback, which has to have a client principal name to authenticate to. That code wants the name to be in the form servicetype@hostname. rpc.svcgssd passes down such names (and passes down no principal name at all in the case the principal isn't a service principal). gss-proxy always passes down the principal name, and passes it down in the form servicetype/hostname@REALM. So we've been munging the name gss-proxy passes down into the format the NFSv4.0 callback code expects, or throwing away the name if we can't. Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've also been using the principal name to verify that certain operations are done as the same principal as was used on the original EXCHANGE_ID call. For that application, the original name passed down by gss-proxy is also useful. Lack of that name in some cases was causing some kerberized NFSv4.1 mount failures in an Active Directory environment. This fix only works in the gss-proxy case. The fix for legacy rpc.svcgssd would be more involved, and rpc.svcgssd already has other problems in the AD case. Reported-and-tested-by: James Ralston <ralston@pobox.com> Acked-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24nfsd: fix unlikely NULL deref in mach_creds_matchJ. Bruce Fields
We really shouldn't allow a client to be created with cl_mach_cred set unless it also has a principal name. This also allows us to fail such cases immediately on EXCHANGE_ID as opposed to waiting and incorrectly returning WRONG_CRED on the following CREATE_SESSION. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24nfsd: minor consolidation of mach_cred handling codeJ. Bruce Fields
Minor cleanup, no change in functionality. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24nfsd: helper for dup of possibly NULL stringJ. Bruce Fields
Technically the initialization in the NULL case isn't even needed as the only caller already has target zeroed out, but it seems safer to keep copy_cred generic. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24GFS2: Extract quota data from reservations structure (revert 5407e24)Bob Peterson
This patch basically reverts the majority of patch 5407e24. That patch eliminated the gfs2_qadata structure in favor of just using the reservations structure. The problem with doing that is that it increases the size of the reservations structure. That is not an issue until it comes time to fold the reservations structure into the inode in memory so we know it's always there. By separating out the quota structure again, we aren't punishing the non-quota users by making all the inodes bigger, requiring more slab space. This patch creates a new slab area to allocate the quota stuff so it's managed a little more sanely. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-23nfs4: limit callback decoding to received bytesBenjamin Coddington
A truncated cb_compound request will cause the client to decode null or data from a previous callback for nfs4.1 backchannel case, or uninitialized data for the nfs4.0 case. This is because the path through svc_process_common() advances the request's iov_base and decrements iov_len without adjusting the overall xdr_buf's len field. That causes xdr_init_decode() to set up the xdr_stream with an incorrect length in nfs4_callback_compound(). Fixing this for the nfs4.1 backchannel case first requires setting the correct iov_len and page_len based on the length of received data in the same manner as the nfs4.0 case. Then the request's xdr_buf length can be adjusted for both cases based upon the remaining iov_len and page_len. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs4: start callback_ident at idr 1Benjamin Coddington
If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing it when the nfs_client is freed. A decoding or server bug can then find and try to put that first nfs_client which would lead to a crash. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: d6870312659d ("nfs4client: convert to idr_alloc()") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: use sliding delay when LAYOUTGET gets NFS4ERR_DELAYJeff Layton
When LAYOUTGET gets NFS4ERR_DELAY, we currently will wait 15s before retrying the call. That is a _very_ long time, so add a timeout value to struct nfs4_layoutget and pass nfs4_async_handle_error a pointer to it. This allows the RPC engine to use a sliding delay window, instead of a 15s delay. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23NFS4: Cleanup FATTR4_WORD0_FS_LOCATIONS after decoding successKinglong Mee
Commit 1ca843a2d2 "nfs: Fix GETATTR bitmap verification" has check the bitmap after decoding success, but decode_attr_fs_locations forgets cleanup the FATTR4_WORD0_FS_LOCATIONS bits. decode_getfattr_attrs always return -EIO when meeting FS_LOCATIONS now. ls: cannot access /mnt/referal: Input/output error ls: cannot access /mnt/replicas: Input/output error total 32 drwxr-xr-x. 7 root root 8192 Nov 16 20:36 pnfs ??????????? ? ? ? ? ? referal ??????????? ? ? ? ? ? replicas v2: clear the bit earlier Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23NFS: Properly set NFS v4.2 NFSDBG_FACILITYAnna Schumaker
NFS v4.2 operations can work outside of pNFS, so dprintk() output shouldn't be placed under NFSDBG_PNFS. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: reduce the amount of ifdefs for v4.2 in nfs4file.cChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: use btrfs ioctl defintions for cloneChristoph Hellwig
The NFS CLONE_RANGE defintion was wrong and thus never worked. Fix this by simply using the btrfs ioctl defintion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: allow intra-file CLONEChristoph Hellwig
Originally CLONE didn't allow for intra-file clones, but we recently updated the spec to support this feature which is also supported by local Linux file systems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: offer native ioctls even if CONFIG_COMPAT is setChristoph Hellwig
Without this for example 64-bit binaries on typical amd64 distributions would not be able to use ioctls on NFS. For now this only affects clones. Additionally ->compat_ioctl is defined even for non-compat builds, so get rid of the pointless ifdef. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: pass on count for CLONE operationsChristoph Hellwig
Currently we pass uninitialized stack garbage in the count parameter. The value is usually large enought to clone whole files and thus let simple tests pass, but it makes the tests for range clones very unhappy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23vfs: Avoid softlockups with sendfile(2)Jan Kara
The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov <dvyukov@google.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23vfs: Make sendfile(2) killable even betterJan Kara
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where sendfile(2) was doing a lot of tiny writes into a filesystem and thus was unkillable for a long time. However sendfile(2) can be (mis)used to issue lots of writes into arbitrary file descriptor such as evenfd or similar special file descriptors which never hit the standard filesystem write path and thus are still unkillable. E.g. the following example from Dmitry burns CPU for ~16s on my test system without possibility to be killed: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); There are actually quite a few tests for pending signals in sendfile code however we data to write is always available none of them seems to trigger. So fix the problem by adding a test for pending signal into splice_from_pipe_next() also before the loop waiting for pipe buffers to be available. This should fix all the lockup issues with sendfile of the do-ton-of-tiny-writes nature. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23fix sysvfs symlinksAl Viro
The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Cc: stable@vger.kernel.org # all of them, not that anyone cared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23nfsd: fix a warning messageDan Carpenter
The WARN() macro takes a condition and a format string. The condition was accidentally left out here so it just prints the function name instead of the message. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-23nfsd: constify nfsd4_callback_ops structureJulia Lawall
The nfsd4_callback_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-23nfsd: recover: constify nfsd4_client_tracking_ops structuresJulia Lawall
The nfsd4_client_tracking_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "A bunch of fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG slub: avoid irqoff/on in bulk allocation slub: create new ___slab_alloc function that can be called with irqs disabled mm: fix up sparse warning in gfpflags_allow_blocking ocfs2: fix umask ignored issue PM/OPP: add entry in MAINTAINERS kernel/panic.c: turn off locks debug before releasing console lock kernel/signal.c: unexport sigsuspend() kasan: fix kmemleak false-positive in kasan_module_alloc() fat: fix fake_offset handling on error path mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes mm/page-writeback.c: initialize m_dirty to avoid compile warning various: fix pci_set_dma_mask return value checking mm: loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 mm: vmalloc: don't remove inexistent guard hole in remove_vm_area() tools/vm/page-types.c: support KPF_IDLE ncpfs: don't allow negative timeouts configfs: allow dynamic group creation MAINTAINERS: add Moritz as reviewer for FPGA Manager Framework slab.h: sprinkle __assume_aligned attributes
2015-11-20ocfs2: fix umask ignored issueJunxiao Bi
New created file's mode is not masked with umask, and this makes umask not work for ocfs2 volume. Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20fat: fix fake_offset handling on error pathOGAWA Hirofumi
For the root directory, . and .. are faked (using dir_emit_dots()) and ctx->pos is reset from 2 to 0. A corrupted root directory could cause fat_get_entry() to fail, but ->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos rewound to 0), so any following calls to ->iterate() continue to return the same entries again and again. The result is that userspace will never see the end of the directory, causing e.g. 'ls' to hang in a getdents() loop. [hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset] Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holesMike Kravetz
Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole punch code. These problems are in the routine remove_inode_hugepages and mostly occur in the case where there are holes in the range of pages to be removed. These holes could be the result of a previous hole punch or simply sparse allocation. The current code could access pages outside the specified range. remove_inode_hugepages handles both hole punch and truncate operations. Page index handling was fixed/cleaned up so that the loop index always matches the page being processed. The code now only makes a single pass through the range of pages as it was determined page faults could not race with truncate. A cond_resched() was added after removing up to PAGEVEC_SIZE pages. Some totally unnecessary code in hugetlbfs_fallocate() that remained from early development was also removed. Tested with fallocate tests submitted here: http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/ And, some ftruncate tests under development Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [4.3] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20ncpfs: don't allow negative timeoutsDan Carpenter
This code causes a static checker warning because it's a user controlled variable where we cap the upper bound but not the lower bound. Let's return an -EINVAL for negative timeouts. [akpm@linux-foundation.org: remove unneeded `else'] Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: David Howells <dhowells@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20configfs: allow dynamic group creationDaniel Baluta
This patchset introduces IIO software triggers, offers a way of configuring them via configfs and adds the IIO hrtimer based interrupt source to be used with software triggers. The architecture is now split in 3 parts, to remove all IIO trigger specific parts from IIO configfs core: (1) IIO configfs - creates the root of the IIO configfs subsys. (2) IIO software triggers - software trigger implementation, dynamically creating /config/iio/triggers group. (3) IIO hrtimer trigger - is the first interrupt source for software triggers (with syfs to follow). Each trigger type can implement its own set of attributes. Lockdep seems to be happy with the locking in configfs patch. This patch (of 5): We don't want to hardcode default groups at subsystem creation time. We export: * configfs_register_group * configfs_unregister_group to allow drivers to programatically create/destroy groups later, after module init time. This is needed for IIO configfs support. (akpm: the other 4 patches to be merged via the IIO tree) Signed-off-by: Daniel Baluta <daniel.baluta@intel.com> Suggested-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Hartmut Knaack <knaack.h@gmx.de> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Adriana Reus <adriana.reus@intel.com> Cc: Cristina Opriceana <cristina.opriceana@gmail.com> Cc: Peter Meerwald <pmeerw@pmeerw.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - A collection of crash and deadlock fixes for DAX that are also tagged for -stable. We will look to re-enable DAX pmd mappings in 4.5, but for now 4.4 and -stable should disable it by default. - A fixup to ext2 and ext4 to mirror the same warning emitted by XFS when mounting with "-o dax" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: block: protect rw_page against device teardown mm, dax: fix DAX deadlocks (COW fault) dax: disable pmd mappings ext2, ext4: warn when mounting with dax enabled
2015-11-20kernfs: implement kernfs_walk_and_get()Tejun Heo
Implement kernfs_walk_and_get() which is similar to kernfs_find_and_get() but can walk a path instead of just a name. v2: Use strlcpy() instead of strlen() + memcpy() as suggested by David. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Miller <davem@davemloft.net>
2015-11-19block: protect rw_page against device teardownDan Williams
Fix use after free crashes like the following: general protection fault: 0000 [#1] SMP Call Trace: [<ffffffffa0050216>] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem] [<ffffffffa0050ba2>] pmem_rw_page+0x42/0x80 [nd_pmem] [<ffffffff8128fd90>] bdev_read_page+0x50/0x60 [<ffffffff812972f0>] do_mpage_readpage+0x510/0x770 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff811d86dc>] ? lru_cache_add+0x1c/0x50 [<ffffffff81297657>] mpage_readpages+0x107/0x170 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff8129058d>] blkdev_readpages+0x1d/0x20 [<ffffffff811d615f>] __do_page_cache_readahead+0x28f/0x310 [<ffffffff811d6039>] ? __do_page_cache_readahead+0x169/0x310 [<ffffffff811c5abd>] ? pagecache_get_page+0x2d/0x1d0 [<ffffffff811c76f6>] filemap_fault+0x396/0x530 [<ffffffff811f816e>] __do_fault+0x4e/0xf0 [<ffffffff811fce7d>] handle_mm_fault+0x11bd/0x1b50 Cc: <stable@vger.kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Matthew Wilcox <willy@linux.intel.com> [willy: symmetry fixups] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-18gfs2: Extended attribute readahead optimizationAndreas Gruenbacher
Instead of submitting a READ_SYNC bio for the inode and a READA bio for the inode's extended attributes through submit_bh, submit a single READ_SYNC bio for both through submit_bio when possible. This can be more efficient on some kinds of block devices. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-18locks: use list_first_entry_or_null()Geliang Tang
Simplify the code with list_first_entry_or_null(). Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-11-16dax: disable pmd mappingsDan Williams
While dax pmd mappings are functional in the nominal path they trigger kernel crashes in the following paths: BUG: unable to handle kernel paging request at ffffea0004098000 IP: [<ffffffff812362f7>] follow_trans_huge_pmd+0x117/0x3b0 [..] Call Trace: [<ffffffff811f6573>] follow_page_mask+0x2d3/0x380 [<ffffffff811f6708>] __get_user_pages+0xe8/0x6f0 [<ffffffff811f7045>] get_user_pages_unlocked+0x165/0x1e0 [<ffffffff8106f5b1>] get_user_pages_fast+0xa1/0x1b0 kernel BUG at arch/x86/mm/gup.c:131! [..] Call Trace: [<ffffffff8106f34c>] gup_pud_range+0x1bc/0x220 [<ffffffff8106f634>] get_user_pages_fast+0x124/0x1b0 BUG: unable to handle kernel paging request at ffffea0004088000 IP: [<ffffffff81235f49>] copy_huge_pmd+0x159/0x350 [..] Call Trace: [<ffffffff811fad3c>] copy_page_range+0x34c/0x9f0 [<ffffffff810a0daf>] copy_process+0x1b7f/0x1e10 [<ffffffff810a11c1>] _do_fork+0x91/0x590 All of these paths are interpreting a dax pmd mapping as a transparent huge page and making the assumption that the pfn is covered by the memmap, i.e. that the pfn has an associated struct page. PTE mappings do not suffer the same fate since they have the _PAGE_SPECIAL flag to cause the gup path to fault. We can do something similar for the PMD path, or otherwise defer pmd support for cases where a struct page is available. For now, 4.4-rc and -stable need to disable dax pmd support by default. For development the "depends on BROKEN" line can be removed from CONFIG_FS_DAX_PMD. Cc: <stable@vger.kernel.org> Cc: Jan Kara <jack@suse.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-16FS-Cache: Add missing initialization of ret in cachefiles_write_page()Geert Uytterhoeven
fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’: fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in this function If the jump to label "error" is taken, "ret" will indeed be uninitialized, and random stack data may be printed by the debug code. Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-16gfs2: Extended attribute readaheadAndreas Gruenbacher
When gfs2 allocates an inode and its extended attribute block next to each other at inode create time, the inode's directory entry indicates that in de_rahead. In that case, we can readahead the extended attribute block when we read in the inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-16GFS2: Use rht_for_each_entry_rcu in glock_hash_walkAndrew Price
This lockdep splat was being triggered on umount: [55715.973122] =============================== [55715.980169] [ INFO: suspicious RCU usage. ] [55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: G W [55715.982353] ------------------------------- [55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() usage! The code it refers to is the rht_for_each_entry_safe usage in glock_hash_walk. The condition that triggers the warning is lockdep_rht_bucket_is_held(tbl, hash) which is checked in the __rcu_dereference_protected macro. The rhashtable buckets are not changed in glock_hash_walk so it's safe to rely on the rcu protection. Replace the rht_for_each_entry_safe() usage with rht_for_each_entry_rcu(), which doesn't care whether the bucket lock is held if the rcu read lock is held. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-11-16GFS2: Delete an unnecessary check before the function call "iput"Markus Elfring
The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-16ext2, ext4: warn when mounting with dax enabledDan Williams
Similar to XFS warn when mounting DAX while it is still considered under development. Also, aspects of the DAX implementation, for example synchronization against multiple faults and faults causing block allocation, depend on the correct implementation in the filesystem. The maturity of a given DAX implementation is filesystem specific. Cc: <stable@vger.kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: linux-ext4@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dave Chinner <david@fromorbit.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-16fs: out of bounds on stack in iov_iter_advanceAl Viro
On Wed, Nov 11, 2015 at 10:19:48AM +0000, Al Viro wrote: > I'll cook the minimal fixup for API change after I get some sleep and > send it your way, unless somebody gets there first... This should do it - switches ->ioctl() to pvfs2_inode_[gs]etxattr() and converts xattr_handler ->[gs]et() to new API. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-16Orangefs: Merge tag 'v4.4-rc1' into for-nextMike Marshall
Linux 4.4-rc1
2015-11-16locks: Don't allow mounts in user namespaces to enable mandatory lockingEric W. Biederman
Since no one uses mandatory locking and files with mandatory locks can cause problems don't allow them in user namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-11-16locks: Allow disabling mandatory locking at compile timeJeff Layton
Mandatory locking appears to be almost unused and buggy and there appears no real interest in doing anything with it. Since effectively no one uses the code and since the code is buggy let's allow it to be disabled at compile time. I would just suggest removing the code but undoubtedly that will break some piece of userspace code somewhere. For the distributions that don't care about this piece of code this gives a nice starting point to make mandatory locking go away. Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jeff Layton <jeff.layton@primarydata.com> Cc: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-11-13Merge tag 'chrome-platform-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform Pull chrome platform updates from Olof Johansson: "Here's the branch of chrome platform changes for v4.4. Some have been queued up for the full 4.3 release cycle since I forgot to send them in for that round (rebased early on to deal with fixes conflicts). Most of these enable EC communication stuff -- Pixel 2015 support, enabling building for ARM64 platforms, and a few fixes for memory leaks. There's also a patch in here to allow reading/writing the verified boot context, which depends on a sysfs patch acked by Greg" * tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform: platform/chrome: Fix i2c-designware adapter name platform/chrome: Support reading/writing the vboot context sysfs: Support is_visible() on binary attributes platform/chrome: cros_ec: Fix possible leak in led_rgb_store() platform/chrome: cros_ec: Fix leak in sequence_store() platform/chrome: Enable Chrome platforms on 64-bit ARM platform/chrome: cros_ec_dev - Add a platform device ID table platform/chrome: cros_ec_lpc - Add support for Google Pixel 2 platform/chrome: cros_ec_lpc - Use existing function to check EC result platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM"
2015-11-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit 517982229f78 ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...
2015-11-13Merge branch 'for-linus-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr cleanups from Al Viro. * 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: f2fs: xattr simplifications squashfs: xattr simplifications 9p: xattr simplifications xattr handlers: Pass handler to operations instead of flags jffs2: Add missing capability check for listing trusted xattrs hfsplus: Remove unused xattr handler list operations ubifs: Remove unused security xattr handler vfs: Fix the posix_acl_xattr_list return value vfs: Check attribute names in posix acl xattr handers
2015-11-13Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - three fixes tagged for -stable including a crash fix, simple performance tweak, and an invalid i/o error. - build regression fix for the nvdimm unit tests - nvdimm documentation update * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix __dax_pmd_fault crash libnvdimm: documentation clarifications libnvdimm, pmem: fix size trim in pmem_direct_access() libnvdimm, e820: fix numa node for e820-type-12 pmem ranges tools/testing/nvdimm, acpica: fix flag rename build breakage
2015-11-13f2fs: xattr simplificationsAndreas Gruenbacher
Now that the xattr handler is passed to the xattr handler operations, we have access to the attribute name prefix, so simplify f2fs_xattr_generic_list. Also, f2fs_xattr_advise_list is only ever called for f2fs_xattr_advise_handler; there is no need to double check for that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Changman Lee <cm224.lee@samsung.com> Cc: Chao Yu <chao2.yu@samsung.com> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13squashfs: xattr simplificationsAndreas Gruenbacher
Now that the xattr handler is passed to the xattr handler operations, we have access to the attribute name prefix, so simplify the squashfs xattr handlers a bit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>